Xen project Mailing List

[xen master] x86/PoD: defer nested P2M flushes

Date: Thu, 21 Oct 2021 03:00:01 +0000

Delivery-date: Thu, 21 Oct 2021 03:00:11 +0000

List-id: "Change log for Mercurial \(receive only\)" <xen-changelog.lists.xenproject.org>

commit 6d187caae50284625af8799a79712c7a4c6a9a59 Author: Jan Beulich <jbeulich@xxxxxxxx> AuthorDate: Wed Oct 20 12:42:44 2021 +0200 Commit: Jan Beulich <jbeulich@xxxxxxxx> CommitDate: Wed Oct 20 12:42:44 2021 +0200 x86/PoD: defer nested P2M flushes With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() -> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock order violation when the PoD lock is held around it. Hence such flushing needs to be deferred. Steal the approach from p2m_change_type_range(). (Note that strictly speaking the change at the out_of_memory label is not needed, as the domain gets crashed there anyway. The change is being made nevertheless to avoid setting up a trap from someone meaning to deal with that case better than by domain_crash().) Similarly for EPT I think ept_set_entry() -> ept_sync_domain() -> ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its p2m_flush_nestedp2m() invocation conditional. Note that this then also alters behavior of p2m_change_type_range() on EPT, deferring the nested flushes there as well. I think this should have been that way from the introduction of the flag. Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> Acked-by: Roger Pau MonnÃ© <roger.pau@xxxxxxxxxx> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> --- xen/arch/x86/mm/p2m-ept.c | 2 +- xen/arch/x86/mm/p2m-pod.c | 30 +++++++++++++++++++++++------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index e7e873dc28..b2d57a3ee8 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(struct p2m_domain *p2m) { if ( p2m_is_nestedp2m(p2m) ) ept = &p2m_get_hostp2m(d)->ept; - else + else if ( !p2m->defer_nested_flush ) p2m_flush_nestedp2m(d); } diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c index 8abc57265c..d0134755e5 100644 --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -24,6 +24,7 @@ #include <xen/mm.h> #include <xen/sched.h> #include <xen/trace.h> +#include <asm/hvm/nestedhvm.h> #include <asm/page.h> #include <asm/paging.h> #include <asm/p2m.h> @@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct page_info *p) static int p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn); +static void pod_unlock_and_flush(struct p2m_domain *p2m) +{ + pod_unlock(p2m); + p2m->defer_nested_flush = false; + if ( nestedhvm_enabled(p2m->domain) ) + p2m_flush_nestedp2m(p2m->domain); +} /* * This function is needed for two reasons: @@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order) gfn_lock(p2m, gfn, order); pod_lock(p2m); + p2m->defer_nested_flush = true; /* * If we don't have any outstanding PoD entries, let things take their @@ -665,7 +674,7 @@ out_entry_check: } out_unlock: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); gfn_unlock(p2m, gfn, order); return ret; } @@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, gfn_t gfn, * won't start until we're done. */ if ( unlikely(d->is_dying) ) - goto out_fail; - + { + pod_unlock(p2m); + return false; + } /* * Because PoD does not have cache list for 1GB pages, it has to remap @@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, gfn_t gfn, p2m_populate_on_demand, p2m->default_access); } + p2m->defer_nested_flush = true; + /* Only reclaim if we're in actual need of more cache. */ if ( p2m->pod.entry_count > p2m->pod.count ) pod_eager_reclaim(p2m); @@ -1229,22 +1242,25 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, gfn_t gfn, __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t); } - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return true; + out_of_memory: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); printk("%s: Dom%d out of PoD memory! (tot=%"PRIu32" ents=%ld dom%d)\n", __func__, d->domain_id, domain_tot_pages(d), p2m->pod.entry_count, current->domain->domain_id); domain_crash(d); return false; + out_fail: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return false; + remap_and_retry: BUG_ON(order != PAGE_ORDER_2M); - pod_unlock(p2m); + pod_unlock_and_flush(p2m); /* * Remap this 2-meg region in singleton chunks. See the comment on the -- generated by git-patchbot for /home/xen/git/xen.git#master

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.