[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Tue, 17 Jan 2023 22:04:44 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3p+t6ZVOvxjY1Cuwg5qSQGK2CeTqrNXupNUGp+51XLw=; b=aZ1xAwbVXQQSPhbtypZe/KwsvjpBav8rC9RI9fNiyJYKskw0OsyYU4yMI92aHQaeqZ9E+jBehf+JuEP5I+o+STmFqUYLx6GkAyvIsP9Py4clpIbuiaxVTOs80WQK1OQtFpt4zM4OdmWgOtQPG03byvMYR/lFWF1Yomv4T2sw5fpAtFAWJbtXiUJZgoqFzs74jrqmgIao6/E+hZNPrwZLo70dIx5oQY462CAx3Z/2OXQGwTqNhATVlmvOS1Wh2b8qz62AWct9NasyvkG18OKsp3dwrh+N+C61bSg/HSghP+OF/Z9xafPMILrk1isjnmBznnx8onAQTdhgkT5m1teQRg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mzUf52PjcxkZTwlSMGgHiPdE4O7IlO366Do0fSlcLUllXQv9N/veRFcTMYC7pNZoibpg0xYYhhy2v3tkpM6hTSEFxbfxTKYT6i9uYoILpJPbr8EzPNxIMY4iBmWr3OfUoJbwmLndEII71DFIwpGLt6uxoacsU9MQun1H3zVOAS0nJLisOGT3yXYoY9ryWMJrSJ5TpHOhU6z91t+JBFU0ygnR1Hx9d5e2cLQu5VbCWLY4pvW5cJXxi4htZP5sIVrBhlS6wA4tuTpb2LhTKsaxK8E+ktLgmvS2uNf1jsaD+btH/bZlfQMjWVxZtdHxFw/UTOf5tBOlUqL+Mn7t7lTuTA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Tue, 17 Jan 2023 22:05:05 +0000
  • Ironport-data: A9a23:q4Zvaq/b782dj3ZZ10I9DrUDS3+TJUtcMsCJ2f8bNWPcYEJGY0x3y mZOX2CAPqyONjPxLt8lbdm3ox8O6pSBnYI1TANtqSk8E34SpcT7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjUAOG6UKucYHsZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqojUNUTNNwRcawr40Ire7kIw1BjOkGlA5AdmPKkU5AW2e0Q9V/rzG4ngdxMUfaEMdgKKb 76r5K20+Grf4yAsBruN+losWhRXKlJ6FVHmZkt+A8BOsDAbzsAB+v9T2M4nQVVWk120c+VZk 72hg3ASpTABZcUgkMxFO/VR/roX0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDklK5 cIFOSJcNSypqMap/Oqwdvg1mf0KeZyD0IM34hmMzBn/JNN+G9XpZfyP4tVVmjAtmspJAPDSI dIDbiZiZwjBZBsJPUoLDJU5n6GjgXyXnz9w8QrJ4/ZopTWKilAuuFTuGIO9ltiibMNZhEuH4 EnB+Hz0GEoyP92D0zuVtHmrg4cjmAuqA9tKSuDprZaGhnXM1kFLJzwaXGKx/+aHjWDuW/ZzD RUbr39GQa8asRbDosPGdx+yrWOAvxUcc8FNCOB84waIooLE7gDcCmUaQzppbN09qNRwVTEsz kWOnd7iGXpoqrL9YXCA8raZqxuiNC5TKnUNDQcfVhcM6dTnpIA1jzrMQ8xlHarzicf6cRnvx xiaoS54gK8c5fPnzI2+9FHDxjiq/57AS1Zv4h2NBj76qARkeISieoqkr0DB6upNJ5qYSV/Hu 2UYn8+Z76YFCpTleDGxfdjh1YqBv56tWAAwS3Y2d3X931xBI0KeQL0=
  • Ironport-hdrordr: A9a23:E7MTQqu+kjbpxwSsnNi6L4va7skDjtV00zEX/kB9WHVpm62j5q aTdZEgvyMc5wxxZJhNo7C90cq7MBThHPxOkOws1N6ZNWGM1QfGQr2Ki7GSoAEIcBeOktK1u5 0QEZRWOZnfNxxTqfeS2njCLz/l+qj8gd2VrPabwW0oSRkvYadm8gt/F0KQE0VwQwVCH/MCZe Khz9sCqSDlfWxSZMK9G3UDQqzPodfWkJ7gfFoHCnccmXCzsQ8=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY4457jyaOcvAnhkOdTNwMDM8hXq6juEqA
  • Thread-topic: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas

On 19/10/2022 8:43 am, Jan Beulich wrote:
> The registration by virtual/linear address has downsides: At least on
> x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
> PV domains the areas are inaccessible (and hence cannot be updated by
> Xen) when in guest-user mode.

They're also inaccessible in HVM guests (x86 and ARM) when Meltdown
mitigations are in place.

And lets not get started on the multitude of layering violations that is
guest_memory_policy() for nested virt.  In fact, prohibiting any form of
map-by-va is a perquisite to any rational attempt to make nested virt work.

(In fact, that infrastructure needs purging before any other
architecture pick up stubs too.)

They're also inaccessible in general because no architecture has
hypervisor privilege in a normal user/supervisor split, and you don't
know whether the mapping is over supervisor or user mapping, and
settings like SMAP/PAN can cause the pagewalk to fail even when the
mapping is in place.


There are a lot of good reasons why map-by-va should never have happened.

Even for PV guests, map-by-gfn (and let the guest manage whatever
virtual mappings it wants on its own) would have been closer to the
status quo for how real hardware worked, and critically would have
avoided the restriction that the areas had to live at a globally fixed
position to be useful.





>
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, flesh out the map/unmap functions.
>
> Noteworthy differences from map_vcpu_info():
> - areas can be registered more than once (and de-registered),

When register by GFN is available, there is never a good reason to the
same area twice.

The guest maps one MMIO-like region, and then constructs all the regular
virtual addresses mapping it (or not) that it wants.

This API is new, so we can enforce sane behaviour from the outset.  In
particular, it will help with ...

> - remote vCPU-s are paused rather than checked for being down (which in
>   principle can change right after the check),
> - the domain lock is taken for a much smaller region.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> RFC: By using global domain page mappings the demand on the underlying
>      VA range may increase significantly. I did consider to use per-
>      domain mappings instead, but they exist for x86 only. Of course we
>      could have arch_{,un}map_guest_area() aliasing global domain page
>      mapping functions on Arm and using per-domain mappings on x86. Yet
>      then again map_vcpu_info() doesn't do so either (albeit that's
>      likely to be converted subsequently to use map_vcpu_area() anyway).

... this by providing a bound on the amount of vmap() space can be consumed.

>
> RFC: In map_guest_area() I'm not checking the P2M type, instead - just
>      like map_vcpu_info() - solely relying on the type ref acquisition.
>      Checking for p2m_ram_rw alone would be wrong, as at least
>      p2m_ram_logdirty ought to also be okay to use here (and in similar
>      cases, e.g. in Argo's find_ring_mfn()). p2m_is_pageable() could be
>      used here (like altp2m_vcpu_enable_ve() does) as well as in
>      map_vcpu_info(), yet then again the P2M type is stale by the time
>      it is being looked at anyway without the P2M lock held.

Again, another error caused by Xen not knowing the guest physical
address layout.  These mappings should be restricted to just RAM regions
and I think we want to enforce that right from the outset.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.