Xen project Mailing List

Re: HVM/PVH Balloon crash

Date: Thu, 30 Sep 2021 09:43:07 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=SpucGsVZhrfevTC5K4PRHVCceXpffo7TMvvJJoU3tNg=; b=Y4EH+gK/2R4wwICbvw/tt2oC/zrGnjqFsw1O9wiizyIXNdp0c1lq4kCimS6P0gKK5KlsWJA83mzchRQkiqnf45O0D/Etfe0f3mmnZPDP1zzUsTM8ibWsyXhvmnvAwH4AjVdTs3sUNr8+6qKOpydUy1LIlRGbSkG7SpWmZU9K6Wza7CktykMRPf9k/z5xA0Q55K5jPI5ioj+t0p0LUvTuVx3frHRSgz44souAa6Z7OzuZPM7ElFwHu95qk5vQDfhdrGE9EvLs11ziVj8DkBy3Nx0+RXNfyw8NUGGW3TLUKb964gWxuFmJLTCkPVDpeY1M9XaDkT+NgKvKrTvhl5bW8g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Vnl9iO/14sn5Zt2MsKCaFt9LiX9TcmuIWi+ACC+KKjmtlrcI0H94LV8AcJmzXys3ZktuQgNcxHl7zTBzc7qtV/GXjkpeWKedC4J1OgN+LDgBUnoZJRI9lCcZ65SrJbfJIrIUXggmY7WV72tozDWhzg/OE7on8nE+56iwmRpPvXIJ+BKR0G5Edw8uMZY8kZ14AW1xJsg9JHO0o2QR3wnnuOIFp93dB8XOIEWNB34Qtrhn0N8aw8ZEQYhqvogIHjX61iR/JH5gZ8v3PgyJAHLQcbxXau5p3Jhzx7EswNxTsQv+W50dkD6YFl0rE2t8C0nR4rBy3w8LbzRmq7+HQn4oHg==

Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 30 Sep 2021 07:43:22 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27.09.2021 00:53, Elliott Mitchell wrote: > On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote: >> On 15.09.2021 04:40, Elliott Mitchell wrote: >>> On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: >>>> On 07.09.2021 17:03, Elliott Mitchell wrote: >>>>> Could be this system is in an >>>>> intergenerational hole, and some spot in the PVH/HVM code makes an >>>>> assumption of the presence of NPT guarantees presence of an operational >>>>> IOMMU. Otherwise if there was some copy and paste while writing IOMMU >>>>> code, some portion of the IOMMU code might be checking for presence of >>>>> NPT instead of presence of IOMMU. >>>> >>>> This is all very speculative; I consider what you suspect not very likely, >>>> but also not entirely impossible. This is not the least because for a >>>> long time we've been running without shared page tables on AMD. >>>> >>>> I'm afraid without technical data and without knowing how to repro, I >>>> don't see a way forward here. >>> >>> Downtimes are very expensive even for lower-end servers. Plus there is >>> the issue the system wasn't meant for development and thus never had >>> appropriate setup done. >>> >>> Experimentation with a system of similar age suggested another candidate. >>> System has a conventional BIOS. Might some dependancies on the presence >>> of UEFI snuck into the NPT code? >> >> I can't think of any such, but as all of this is very nebulous I can't >> really rule out anything. > > Getting everything right to recreate is rather inexact. Having an > equivalent of `sysctl` to turn on the serial console while running might > be handy... > > Luckily get things together and... > > (XEN) mm locking order violation: 48 > 16 > (XEN) Xen BUG at mm-locks.h:82 Would you give the patch below a try? While against current staging it looks to apply fine to 4.14.3. Jan x86/PoD: defer nested P2M flushes With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() -> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock order violation when the PoD lock is held around it. Hence such flushing needs to be deferred. Steal the approach from p2m_change_type_range(). Reported-by: Elliott Mitchell <ehem+xen@xxxxxxx> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -24,6 +24,7 @@ #include <xen/mm.h> #include <xen/sched.h> #include <xen/trace.h> +#include <asm/hvm/nestedhvm.h> #include <asm/page.h> #include <asm/paging.h> #include <asm/p2m.h> @@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct static int p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn); +static void pod_unlock_and_flush(struct p2m_domain *p2m) +{ + pod_unlock(p2m); + p2m->defer_nested_flush = false; + if ( nestedhvm_enabled(p2m->domain) ) + p2m_flush_nestedp2m(p2m->domain); +} /* * This function is needed for two reasons: @@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma gfn_lock(p2m, gfn, order); pod_lock(p2m); + p2m->defer_nested_flush = true; /* * If we don't have any outstanding PoD entries, let things take their @@ -665,7 +674,7 @@ out_entry_check: } out_unlock: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); gfn_unlock(p2m, gfn, order); return ret; } @@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai * won't start until we're done. */ if ( unlikely(d->is_dying) ) - goto out_fail; - + { + pod_unlock(p2m); + return false; + } /* * Because PoD does not have cache list for 1GB pages, it has to remap @@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai p2m_populate_on_demand, p2m->default_access); } + p2m->defer_nested_flush = true; + /* Only reclaim if we're in actual need of more cache. */ if ( p2m->pod.entry_count > p2m->pod.count ) pod_eager_reclaim(p2m); @@ -1229,8 +1242,9 @@ p2m_pod_demand_populate(struct p2m_domai __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t); } - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return true; + out_of_memory: pod_unlock(p2m); @@ -1239,12 +1253,14 @@ out_of_memory: p2m->pod.entry_count, current->domain->domain_id); domain_crash(d); return false; + out_fail: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return false; + remap_and_retry: BUG_ON(order != PAGE_ORDER_2M); - pod_unlock(p2m); + pod_unlock_and_flush(p2m); /* * Remap this 2-meg region in singleton chunks. See the comment on the

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.