[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 26 Sep 2022 17:43:06 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/wLw02V4JwhQCt/QO7yeGhhN/aqVFrk/+3J/uoGXdoE=; b=DYLQrMlfxbjcs91uMcsgwpNJvCSg8BT7S2/QZcxXh4lLYuKyCpv86EEQJbZ6m0aNvSTO7826W1ZH1p6diodb5rbOyosiA4MqRhTgZj5omVCciawofnwdByXve7Bi1evyqcy7+4kN+ZO8rJd83TMB7Rzd/Rb9dQ6C8FGZP/PtQLhnfsZA7DB344PCJ6yW9DJ7e8CxWTyTWG3Av9YSEKcu2RkBlRy+CpjRidM/2H7TToNkc1XUTb+Tta5bituSEUgacDMWJALUopc4L3fJWkQOqwv+Mgi7YRjEwN6kXjEimXOdrKr13xyxbXXvP5fQ2spghF7kU2bpI2O3JLjWVqDwCg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CdwIO187jJ06U/9c6GPFEHStikVx1/wfhW4C1ZJkW2BXPJ/YBnsMnU9rVVbEwS1CgM7yqjkxjVgjtA51p0GMDl0W/eQh1bBgZlXBOEULKAh60tAStxqomWy+F3A6mIAcyeCcYgd9MlY0RWqvKq3E6Y7kCwz1aRugxBqg9LT7/NXD1vgCi7NYQ7igL1aJ1a7gU/zbcvfzS7smQrYd4Lbt4mGGPVX5QppIQs/2XoFi+58k6k7KggJ5LDKaTJQoTdOZ2dfjERbF2i7Qv4+9G+ByKJpXs/S0IERNKGxhs+j+rdAz5LQ0jCI1qxamaBabtOY+NPKc942pMJ0nY3kbKEgCng==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 26 Sep 2022 15:43:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26.09.2022 16:27, Roger Pau Monné wrote:
> On Mon, Sep 26, 2022 at 10:38:40AM +0200, Jan Beulich wrote:
>> On 23.09.2022 12:56, Roger Pau Monne wrote:
>>> The current way to detect whether a page handled to
>>> epte_get_entry_emt() is special and needs a forced write-back cache
>>> attribute involves iterating over all the smaller 4K pages for
>>> superpages.
>>>
>>> Such loop consumes a high amount of CPU time for 1GiB pages (order
>>> 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an
>>> average amount of time of 1.5ms.  Note that this figure just accounts
>>> for the is_special_page() loop, and not the whole code of
>>> epte_get_entry_emt().  Also the resolve_misconfig() operation that
>>> calls into epte_get_entry_emt() is done while holding the p2m lock in
>>> write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults
>>> and prevents most guest hypercalls for progressing due to the need to
>>> take the p2m lock in read mode to access any guest provided hypercall
>>> buffers.
>>>
>>> Simplify the checking in epte_get_entry_emt() and remove the loop,
>>> assuming that there won't be superpages being only partially special.
>>>
>>> So far we have no special superpages added to the guest p2m,
>>
>> We may not be adding them as superpages, but what a guest makes of
>> the pages it is given access to for e.g. grant handling, or what Dom0
>> makes of e.g. the (per-CPU) trace buffers is unknown. And I guess
>> Dom0 ending up with a non-WB mapping of the trace buffers might
>> impact tracing quite a bit. I don't think we can build on guests not
>> making any such the subject of a large-range mapping attempt, which
>> might end up suitable for a superpage mapping (recall that rather
>> sooner than later we ought to finally re-combine suitable ranges of
>> contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU
>> code).
> 
> Hm, doesn't pages used for grant handling (XENMAPSPACE_grant_table)
> cause them to be mapped as 4K entries in the p2m page tables.  The
> code in xenmem_add_to_physmap_one() seems to remove and re-add them
> with order 0. Same with the trace buffers, they are added as order 0
> to the p2m.

Indeed. I was half way through writing the earlier response when
recalling that aspect; I may not have succeeded in adjusting the text
to properly convey that the concern is applicable only to future code,
not what we have right now.

> Note that when coalescing we would need to be careful then to not
> coalesce special pages.

Well, no, ...

> Might not be the best model because I'm not sure why we require
> XENMAPSPACE_grant_table to force entries to not be mapped as part of a
> super page in the guest p2m.

... as you say here, there may actually be benefits from allowing such
(re-)coalescing.

>> Since for data structures like the ones named above 2M mappings
>> might be enough (i.e. there might be little "risk" of even needing to
>> go to 1G ones), could we maybe take a "middle" approach and check all
>> pages when order == 9, but use your approach for higher orders? The
>> to-be-added re-coalescing would then need to by taught to refuse re-
>> coalescing of such ranges to larger than 2M mappings, while still
>> at least allowing for 2M ones. (Special casing at that boundary is
>> going to be necessary also for shadow code, at the very least.) But
>> see also below as to caveats.
> 
> I guess a rangeset would be more future proof than anything else.
> 
>>> and in
>>> any case the forcing of the write-back cache attribute is a courtesy
>>> to the guest to avoid such ranges being accessed as uncached when not
>>> really needed.  It's not acceptable for such assistance to tax the
>>> system so badly.
>>
>> I agree we would better improve the situation, but I don't think we
>> can do so by ...
>>
>>> @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, 
>>> mfn_t mfn,
>>>          return MTRR_TYPE_UNCACHABLE;
>>>      }
>>>  
>>> -    if ( type != p2m_mmio_direct && !is_iommu_enabled(d) &&
>>> -         !cache_flush_permitted(d) )
>>> +    if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) &&
>>> +          !cache_flush_permitted(d)) ||
>>> +         /*
>>> +          * Assume the whole page to be special if the first 4K chunk is:
>>> +          * iterating over all possible 4K sub-pages for higher order 
>>> pages is
>>> +          * too expensive.
>>> +          */
>>> +         is_special_page(mfn_to_page(mfn)) )
>>
>> ... building in assumptions like this one. The more that here you may
>> also produce too weak a memory type (think of a later page in the range
>> requiring a stronger-ordered memory type).
>>
>> While it may not help much, ...
>>
>>>      {
>>>          *ipat = true;
>>>          return MTRR_TYPE_WRBACK;
>>>      }
>>>  
>>> -    for ( special_pgs = i = 0; i < (1ul << order); i++ )
>>> -        if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
>>> -            special_pgs++;
>>> -
>>> -    if ( special_pgs )
>>> -    {
>>> -        if ( special_pgs != (1ul << order) )
>>> -            return -1;
>>> -
>>> -        *ipat = true;
>>> -        return MTRR_TYPE_WRBACK;
>>> -    }
>>
>> ... this logic could be improved to at least bail from the loop once it's
>> clear that the "-1" return path will be taken. Improvements beyond that
>> would likely involve adding some data structure (rangeset?) to track
>> special pages.
> 
> For the guest I was running the loop didn't find any special pages in
> order 18 mappings, which are the most troublesome to handle in the
> loop.  I'm not sure bailing early would make that much of a difference
> in practice TBH.

As said - it may not help much.

> I did also consider using a rangeset, but that would use more
> per-domain memory and also require coordination to add special pages
> to it.

"Special" is a global property, so I don't think this would need to be a
per-domain data structure.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.