[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Fri, 20 Jan 2023 18:15:38 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ZEsfnBZrPpNi72sXbq3Xf9959Zyg3A6HDpx493rqq44=; b=Br0snCQqtt9NfupWEY9zDZG6AVZPpwhq6S7CJ667ReTH6GHbDipZXiQBXVBz9OmaCq3m6TMWbv4l3tP/EpUIBJGWvzdF/L9SeDqW7quQH4uiQO0iVBAFtPjK5oDYbuwPMvmXSpN2nh3meD7lmNkLsKrtWDztqE1VFYdqw48UtADOrPnvI4GH8+Rg29XdYFVWZNd+lsY/L766nN9WtlaHoIhrUEKPYTS6EQmm5ajE3d6nNZ+xbnhtl4ChqW6i5rk/j5QW6fAvZE5y09LUzT27IF+B8/1dWqa8sN9FZqtsHJVgmh2mMVmQm/IcVctjeBbULZCXZkDvfPeaTo+/JHjuoA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AuwyGfaOKcdpmd/KXsWuzF6mpy4e59LfRkvTFlEtJvJectjdw6mF1tjkanuGkbBqlTDuneWRpE/02iyWJV4Myr2igsUzmXZXpEe6FqbydXn9O+gEK/DsADIhJWP2EONLsBUWlLWbHAdF/p5n6Yj93O92LzdH1Selhe+//Uf5hu9HQH4r3F0RKetRSkPZLJ0IqqwIYhuaQFk6tYu+8rn5o0uN4a21U/fmJn6oKSGt2XRXwek4Rj1vuLlQYk8arm0QXTLIK1DRiB5JNeBhjMbxvwWyT0/xIFHwvaQVLv9iaNTQqlZmo8YTnxiio/YVXxh3PEFIdoLwI6/llLX5mJMfKA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 20 Jan 2023 18:16:05 +0000
  • Ironport-data: A9a23:Qdas9K4krxv2LDtYFKa6IQxRtPLGchMFZxGqfqrLsTDasY5as4F+v mBLWWiGPKqCZmX9fYgkO4nk80sEu8OBmoA3SAZtpC1nHi5G8cbLO4+Ufxz6V8+wwm8vb2o8t plDNYOQRCwQZiWBzvt4GuG59RGQ7YnRGvynTraBYnoqLeNdYH9JoQp5nOIkiZJfj9G8Agec0 fv/uMSaM1K+s9JOGjt8B5mr9VU+45wehBtC5gZlPakR5AeF/5UoJMl3yZ+ZfiOQrrZ8RoZWd 86bpJml82XQ+QsaC9/Nut4XpWVTH9Y+lSDX4pZnc/DKbipq/0Te4Y5iXBYoUm9Fii3hojxE4 I4lWapc6+seFvakdOw1C3G0GszlVEFM0OevzXOX6aR/w6BaGpdFLjoH4EweZOUlFuhL7W5m3 9UoJygyQyK6t+O8nra3acJIquA+I5y+VG8fkikIITDxK98DGMiGaYOVoNhS0XE3m9xEGuvYa 4wBcz1zYR/cYhpJfFAKFJY5m+TujX76G9FagAvN+exrvC6MlEooiOmF3Nn9I7RmQe18mEqCq 32A1GP+GhwAb/SUyCaf82LqjejK9c/+cNNJSefnrqA26LGV7jEpVQcMZWOCmMaatkGRUOhVG 0kPxAN7+MDe82TuFLERRSaQonSJoxodUNp4CPAh5UeGza+8ywSEAmkJSBZRZdpgs9U5LRQxz UOAld7tAT1psZWWRGib+7PSqim9UQAKKUcSaClCShEKi/HzrYd2gh/RQ9JLFK+uksazCTz22 yqNriU1m/MUl8Fj6kmg1VXOgjbprJ6WSAcwv13TRjj8tlI/Y5O5bYu171Sd9exHMIuSUliGu j4DhtSa6+cNS5qKkURhXdkwIV1g3N7dWBW0vLKlN8BJG+iFk5J7Qb1t3Q==
  • Ironport-hdrordr: A9a23:+kP74qh3+RN9Hex1fpwBC8qlGnBQXh4ji2hC6mlwRA09TyX5ra 2TdZUgpHrJYVMqMk3I9uruBEDtex3hHP1OkOss1NWZPDUO0VHARO1fBOPZqAEIcBeOldK1u5 0AT0B/YueAd2STj6zBkXSF+wBL+qj6zEiq792usEuEVWtRGsVdB58SMHfiLqVxLjM2YqYRJd 6nyedsgSGvQngTZtTTPAh/YwCSz+e78q4PeHQ9dmca1DU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY4457jyaOcvAnhkOdTNwMDM8hXq6juEqAgADGjACAA7ByAA==
  • Thread-topic: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas

On 18/01/2023 9:55 am, Jan Beulich wrote:
> On 17.01.2023 23:04, Andrew Cooper wrote:
>> On 19/10/2022 8:43 am, Jan Beulich wrote:
>>> The registration by virtual/linear address has downsides: At least on
>>> x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
>>> PV domains the areas are inaccessible (and hence cannot be updated by
>>> Xen) when in guest-user mode.
>> They're also inaccessible in HVM guests (x86 and ARM) when Meltdown
>> mitigations are in place.
> I've added this explicitly, but ...
>
>> And lets not get started on the multitude of layering violations that is
>> guest_memory_policy() for nested virt.  In fact, prohibiting any form of
>> map-by-va is a perquisite to any rational attempt to make nested virt work.
>>
>> (In fact, that infrastructure needs purging before any other
>> architecture pick up stubs too.)
>>
>> They're also inaccessible in general because no architecture has
>> hypervisor privilege in a normal user/supervisor split, and you don't
>> know whether the mapping is over supervisor or user mapping, and
>> settings like SMAP/PAN can cause the pagewalk to fail even when the
>> mapping is in place.
> ... I'm now merely saying that there are yet more reasons, rather than
> trying to enumerate them all.

That's fine.  I just wanted to point out that its far more reasons that
were given the first time around.

>>> In preparation of the introduction of new vCPU operations allowing to
>>> register the respective areas (one of the two is x86-specific) by
>>> guest-physical address, flesh out the map/unmap functions.
>>>
>>> Noteworthy differences from map_vcpu_info():
>>> - areas can be registered more than once (and de-registered),
>> When register by GFN is available, there is never a good reason to the
>> same area twice.
> Why not? Why shouldn't different entities be permitted to register their
> areas, one after the other? This at the very least requires a way to
> de-register.

Because it's useless and extra complexity.  From the point of view of
any guest, its an MMIO(ish) window that Xen happens to update the
content of.

You don't get systems where you can ask hardware for e.g. "another copy
of the HPET at mfn $foo please".

>> The guest maps one MMIO-like region, and then constructs all the regular
>> virtual addresses mapping it (or not) that it wants.
>>
>> This API is new, so we can enforce sane behaviour from the outset.  In
>> particular, it will help with ...
>>
>>> - remote vCPU-s are paused rather than checked for being down (which in
>>>   principle can change right after the check),
>>> - the domain lock is taken for a much smaller region.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>> ---
>>> RFC: By using global domain page mappings the demand on the underlying
>>>      VA range may increase significantly. I did consider to use per-
>>>      domain mappings instead, but they exist for x86 only. Of course we
>>>      could have arch_{,un}map_guest_area() aliasing global domain page
>>>      mapping functions on Arm and using per-domain mappings on x86. Yet
>>>      then again map_vcpu_info() doesn't do so either (albeit that's
>>>      likely to be converted subsequently to use map_vcpu_area() anyway).
>> ... this by providing a bound on the amount of vmap() space can be consumed.
> I'm afraid I don't understand. When re-registering a different area, the
> earlier one will be unmapped. The consumption of vmap space cannot grow
> (or else we'd have a resource leak and hence an XSA).

In which case you mean "can be re-registered elsewhere".  More
specifically, the area can be moved, and isn't a singleton operation
like map_vcpu_info was.

The wording as presented firmly suggests the presence of an XSA.

>>> RFC: In map_guest_area() I'm not checking the P2M type, instead - just
>>>      like map_vcpu_info() - solely relying on the type ref acquisition.
>>>      Checking for p2m_ram_rw alone would be wrong, as at least
>>>      p2m_ram_logdirty ought to also be okay to use here (and in similar
>>>      cases, e.g. in Argo's find_ring_mfn()). p2m_is_pageable() could be
>>>      used here (like altp2m_vcpu_enable_ve() does) as well as in
>>>      map_vcpu_info(), yet then again the P2M type is stale by the time
>>>      it is being looked at anyway without the P2M lock held.
>> Again, another error caused by Xen not knowing the guest physical
>> address layout.  These mappings should be restricted to just RAM regions
>> and I think we want to enforce that right from the outset.
> Meaning what exactly in terms of action for me to take? As said, checking
> the P2M type is pointless. So without you being more explicit, all I can
> take your reply for is merely a comment, with no action on my part (not
> even to remove this RFC remark).

There will become a point where it will need to become prohibited to
issue this against something which isn't p2m_type_ram.  If we had a sane
idea of the guest physmap, I'd go as far as saying E820_RAM, but that's
clearly not feasible yet.

Even now, absolutely nothing good can possibly come of e.g. trying to
overlay it on the grant table, or a grant mapping.

ram || logdirty ought to exclude most cases we care about the guest
(not) putting the mapping.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.