[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 26 Sep 2022 10:38:40 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8L4m36y35zpCO6zDxpAOEsE3tVCWNopsMi11EoMNLHc=; b=idTnMKD6bylfO0xsvt+CudZogPmf5Zyohu90ENYfMKwLZwiQcKdVyNddD50ltdmn08QzdIflMU0QfwAYHyKQiOatuwjp6H9PdE3R81BpOMcvwN1RV1n6T3ReERysPVrnh1ejifwWHP5Sf2qCA/NBOwjqMKqZE53OsEFzxikoovTzKJonRrF2DS4DeTkSjZX2x3q3bu6pkX5JQAQhGJSyg/EwAKtFvj5Ub1FSDHqZtvlHVVLQtEDoL90o1dUL+tdTniwUq/ggkMffJs1Jm51QUQ/ppXJDN759Jc4j8PtNbBraUkr4t/8P9GOtt0EbU1IFvjQb+tGKyEzzIkyODtJOug==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VuNLff+fyeulQkHSajmKkQcun0+enQJYq3qT+q8yFibY6o9Pr8I8F3Bz+FzvReqK5bTSilv+TOd6EgrOThKZMZRoog8kHuJpX17T2jmRYPl/Y2dlcHzuUvGTxfmFvMU83lY1ekORE7f1B3gSysO1FXNBvBl519EwbLPpa5vEy6qTn32i2u4ZQJk6BnAhVKKNyGNCHgj5zRBoHRsxtI/dC6p66g1DHyV2y8nviGeVXoZcJSo7ZZPi+57apLjTEHdVVRFynCtH+rJJD5Ws7lZGYKAx2FWN/UzZ5EX7A+OyMg1Uo0YQjfB1YvWLnI29RmnnUwDAgPRP9Py/u6jhYi8eyw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 26 Sep 2022 08:38:41 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 23.09.2022 12:56, Roger Pau Monne wrote:
> The current way to detect whether a page handled to
> epte_get_entry_emt() is special and needs a forced write-back cache
> attribute involves iterating over all the smaller 4K pages for
> superpages.
> 
> Such loop consumes a high amount of CPU time for 1GiB pages (order
> 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an
> average amount of time of 1.5ms.  Note that this figure just accounts
> for the is_special_page() loop, and not the whole code of
> epte_get_entry_emt().  Also the resolve_misconfig() operation that
> calls into epte_get_entry_emt() is done while holding the p2m lock in
> write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults
> and prevents most guest hypercalls for progressing due to the need to
> take the p2m lock in read mode to access any guest provided hypercall
> buffers.
> 
> Simplify the checking in epte_get_entry_emt() and remove the loop,
> assuming that there won't be superpages being only partially special.
> 
> So far we have no special superpages added to the guest p2m,

We may not be adding them as superpages, but what a guest makes of
the pages it is given access to for e.g. grant handling, or what Dom0
makes of e.g. the (per-CPU) trace buffers is unknown. And I guess
Dom0 ending up with a non-WB mapping of the trace buffers might
impact tracing quite a bit. I don't think we can build on guests not
making any such the subject of a large-range mapping attempt, which
might end up suitable for a superpage mapping (recall that rather
sooner than later we ought to finally re-combine suitable ranges of
contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU
code).

Since for data structures like the ones named above 2M mappings
might be enough (i.e. there might be little "risk" of even needing to
go to 1G ones), could we maybe take a "middle" approach and check all
pages when order == 9, but use your approach for higher orders? The
to-be-added re-coalescing would then need to by taught to refuse re-
coalescing of such ranges to larger than 2M mappings, while still
at least allowing for 2M ones. (Special casing at that boundary is
going to be necessary also for shadow code, at the very least.) But
see also below as to caveats.

> and in
> any case the forcing of the write-back cache attribute is a courtesy
> to the guest to avoid such ranges being accessed as uncached when not
> really needed.  It's not acceptable for such assistance to tax the
> system so badly.

I agree we would better improve the situation, but I don't think we
can do so by ...

> @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, 
> mfn_t mfn,
>          return MTRR_TYPE_UNCACHABLE;
>      }
>  
> -    if ( type != p2m_mmio_direct && !is_iommu_enabled(d) &&
> -         !cache_flush_permitted(d) )
> +    if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) &&
> +          !cache_flush_permitted(d)) ||
> +         /*
> +          * Assume the whole page to be special if the first 4K chunk is:
> +          * iterating over all possible 4K sub-pages for higher order pages 
> is
> +          * too expensive.
> +          */
> +         is_special_page(mfn_to_page(mfn)) )

... building in assumptions like this one. The more that here you may
also produce too weak a memory type (think of a later page in the range
requiring a stronger-ordered memory type).

While it may not help much, ...

>      {
>          *ipat = true;
>          return MTRR_TYPE_WRBACK;
>      }
>  
> -    for ( special_pgs = i = 0; i < (1ul << order); i++ )
> -        if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
> -            special_pgs++;
> -
> -    if ( special_pgs )
> -    {
> -        if ( special_pgs != (1ul << order) )
> -            return -1;
> -
> -        *ipat = true;
> -        return MTRR_TYPE_WRBACK;
> -    }

... this logic could be improved to at least bail from the loop once it's
clear that the "-1" return path will be taken. Improvements beyond that
would likely involve adding some data structure (rangeset?) to track
special pages.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.