Xen project Mailing List

Re: [PATCH] x86/ept: simplify detection of special pages for EMT calculation

To: Roger Pau Monne <roger.pau@xxxxxxxxxx>

Date: Mon, 26 Sep 2022 10:38:40 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8L4m36y35zpCO6zDxpAOEsE3tVCWNopsMi11EoMNLHc=; b=idTnMKD6bylfO0xsvt+CudZogPmf5Zyohu90ENYfMKwLZwiQcKdVyNddD50ltdmn08QzdIflMU0QfwAYHyKQiOatuwjp6H9PdE3R81BpOMcvwN1RV1n6T3ReERysPVrnh1ejifwWHP5Sf2qCA/NBOwjqMKqZE53OsEFzxikoovTzKJonRrF2DS4DeTkSjZX2x3q3bu6pkX5JQAQhGJSyg/EwAKtFvj5Ub1FSDHqZtvlHVVLQtEDoL90o1dUL+tdTniwUq/ggkMffJs1Jm51QUQ/ppXJDN759Jc4j8PtNbBraUkr4t/8P9GOtt0EbU1IFvjQb+tGKyEzzIkyODtJOug==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VuNLff+fyeulQkHSajmKkQcun0+enQJYq3qT+q8yFibY6o9Pr8I8F3Bz+FzvReqK5bTSilv+TOd6EgrOThKZMZRoog8kHuJpX17T2jmRYPl/Y2dlcHzuUvGTxfmFvMU83lY1ekORE7f1B3gSysO1FXNBvBl519EwbLPpa5vEy6qTn32i2u4ZQJk6BnAhVKKNyGNCHgj5zRBoHRsxtI/dC6p66g1DHyV2y8nviGeVXoZcJSo7ZZPi+57apLjTEHdVVRFynCtH+rJJD5Ws7lZGYKAx2FWN/UzZ5EX7A+OyMg1Uo0YQjfB1YvWLnI29RmnnUwDAgPRP9Py/u6jhYi8eyw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 26 Sep 2022 08:38:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 23.09.2022 12:56, Roger Pau Monne wrote: > The current way to detect whether a page handled to > epte_get_entry_emt() is special and needs a forced write-back cache > attribute involves iterating over all the smaller 4K pages for > superpages. > > Such loop consumes a high amount of CPU time for 1GiB pages (order > 18): on a Xeon® Silver 4216 (Cascade Lake) at 2GHz this takes an > average amount of time of 1.5ms. Note that this figure just accounts > for the is_special_page() loop, and not the whole code of > epte_get_entry_emt(). Also the resolve_misconfig() operation that > calls into epte_get_entry_emt() is done while holding the p2m lock in > write (exclusive) mode, which blocks concurrent EPT_MISCONFIG faults > and prevents most guest hypercalls for progressing due to the need to > take the p2m lock in read mode to access any guest provided hypercall > buffers. > > Simplify the checking in epte_get_entry_emt() and remove the loop, > assuming that there won't be superpages being only partially special. > > So far we have no special superpages added to the guest p2m, We may not be adding them as superpages, but what a guest makes of the pages it is given access to for e.g. grant handling, or what Dom0 makes of e.g. the (per-CPU) trace buffers is unknown. And I guess Dom0 ending up with a non-WB mapping of the trace buffers might impact tracing quite a bit. I don't think we can build on guests not making any such the subject of a large-range mapping attempt, which might end up suitable for a superpage mapping (recall that rather sooner than later we ought to finally re-combine suitable ranges of contiguous 4k mappings into 2M ones, just like we [now] do in IOMMU code). Since for data structures like the ones named above 2M mappings might be enough (i.e. there might be little "risk" of even needing to go to 1G ones), could we maybe take a "middle" approach and check all pages when order == 9, but use your approach for higher orders? The to-be-added re-coalescing would then need to by taught to refuse re- coalescing of such ranges to larger than 2M mappings, while still at least allowing for 2M ones. (Special casing at that boundary is going to be necessary also for shadow code, at the very least.) But see also below as to caveats. > and in > any case the forcing of the write-back cache attribute is a courtesy > to the guest to avoid such ranges being accessed as uncached when not > really needed. It's not acceptable for such assistance to tax the > system so badly. I agree we would better improve the situation, but I don't think we can do so by ... > @@ -518,26 +517,19 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, > mfn_t mfn, > return MTRR_TYPE_UNCACHABLE; > } > > - if ( type != p2m_mmio_direct && !is_iommu_enabled(d) && > - !cache_flush_permitted(d) ) > + if ( (type != p2m_mmio_direct && !is_iommu_enabled(d) && > + !cache_flush_permitted(d)) || > + /* > + * Assume the whole page to be special if the first 4K chunk is: > + * iterating over all possible 4K sub-pages for higher order pages > is > + * too expensive. > + */ > + is_special_page(mfn_to_page(mfn)) ) ... building in assumptions like this one. The more that here you may also produce too weak a memory type (think of a later page in the range requiring a stronger-ordered memory type). While it may not help much, ... > { > *ipat = true; > return MTRR_TYPE_WRBACK; > } > > - for ( special_pgs = i = 0; i < (1ul << order); i++ ) > - if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) ) > - special_pgs++; > - > - if ( special_pgs ) > - { > - if ( special_pgs != (1ul << order) ) > - return -1; > - > - *ipat = true; > - return MTRR_TYPE_WRBACK; > - } ... this logic could be improved to at least bail from the loop once it's clear that the "-1" return path will be taken. Improvements beyond that would likely involve adding some data structure (rangeset?) to track special pages. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.