[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HVM/PVH Balloon crash


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 29 Sep 2021 15:32:15 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=oHMBgiVs+96mMllAD1yAUgzCmsNXT083ycPIKU3HohE=; b=hq7jAioj2W/mwZVIUXq3qVvW4MMBz/rtAjVPVkKcroiXoYI4ZCVMvBQ8go9VgiOqGinWlvybTpCviVq+opu7A78d+fXmYIB9jVGzO/ezTgiMCU7br7+Ry+FsEnOiJmZ6+j0jpol24JB9wFf79GfC1Q0yQ7dJR1ka9tvz54jXzXCgCwVS4NX01NsSMnqtcpu/0drwov05p4K5FkD5oR9CUTUUFzlSmmt51mr0C/tt40XsIr/FduT5ZGy5sS9Hkq8qxDKMr5dcFg7wyrRNAq9E/g/nV7msVu23P2RYEyAqrdGW7r8jslMTRFLK+sKPJVSeWRwARacj7dyk4Wh/m3vikg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fPLkKJXAwuoLq1mmNASECZnzC1P+gKIG4zOs/a+5ofMQxjkfqrQvAsPcseG9Eef9h/FInEqW7V+XdU9gUz87iCLNKp3zOS1lW4tP95Um/wEvj7QjFqeWiQjxYPu6XGYeVx690uDj+3DuTwFh4S1U2/YXrsONdCnCA7P31LrZ3sK4S1xomGjjT+fqSHBTvE7z9yDN4aiR3CP0A5xN9r1f5qcifY4IkjAkY81eE4MU0Q7ZPwtc9PtM8xJG7yVHyTtPGM2NY3Au2PiGATEsEPslGEzk6144TbZgvWYm9dxp4w3GYegLr9dBAhXyTEMyzVM1gCjBRiJXXpOJKRbxh8lTQA==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 29 Sep 2021 13:32:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27.09.2021 00:53, Elliott Mitchell wrote:
> Getting everything right to recreate is rather inexact.  Having an
> equivalent of `sysctl` to turn on the serial console while running might
> be handy...
> 
> Luckily get things together and...

Thanks; finally got around to look at this in at least slightly more
detail.

> (XEN) mm locking order violation: 48 > 16
> (XEN) Xen BUG at mm-locks.h:82
> (XEN) ----[ Xen-4.14.3  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82d0402e8be0>] 
> arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
> (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d1v0)
> (XEN) rax: ffff83080b2f106c   rbx: ffff83081da0f2d0   rcx: 0000000000000000
> (XEN) rdx: ffff83080b27ffff   rsi: 000000000000000a   rdi: ffff82d040469738
> (XEN) rbp: ffff82d040580688   rsp: ffff83080b27f8b0   r8:  0000000000000002
> (XEN) r9:  0000000000008000   r10: ffff82d04058f381   r11: ffff82d040375100
> (XEN) r12: ffff82d040580688   r13: ffff83080b27ffff   r14: ffff83081ddf6000
> (XEN) r15: 00000000004f8c00   cr0: 000000008005003b   cr4: 00000000000406e0
> (XEN) cr3: 000000081dee6000   cr2: 0000000000000000
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0010   gs: 0010   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d0402e8be0> 
> (arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260):
> (XEN)  e3 0c 00 e8 30 7f f6 ff <0f> 0b 66 0f 1f 44 00 00 42 8b 34 20 48 8d 3d 
> 8d
> (XEN) Xen stack trace from rsp=ffff83080b27f8b0:
> [...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
> (XEN)    [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30
> (XEN)    [<ffff82d0402e0528>] S 
> arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490

hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that
nestedhvm_enabled() was true for the domain. While we will want to
fix this, nested virt is experimental (even in current staging),
and hence there at least is no security concern.

Can you confirm that by leaving nested off you don't run into this
(or a similar) issue?

Of course you not having done this with a debug build (and frame
pointers in particular) leaves a level of uncertainty, i.e. the
real call chain may have been different from what this call trace
suggests.

Jan

> (XEN)    [<ffff82d0402f009a>] S 
> arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x24a/0x2e0
> (XEN)    [<ffff82d0402f1097>] S 
> arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x3c7/0x7b0
> (XEN)    [<ffff82d0402ea0a6>] S p2m_set_entry+0xa6/0x130
> (XEN)    [<ffff82d0402f4ecd>] S 
> arch/x86/mm/p2m-pod.c#p2m_pod_zero_check+0x1cd/0x440
> (XEN)    [<ffff82d0402f023f>] S arch/x86/mm/p2m-pt.c#do_recalc+0x10f/0x470
> (XEN)    [<ffff82d0402f02ed>] S arch/x86/mm/p2m-pt.c#do_recalc+0x1bd/0x470
> (XEN)    [<ffff82d0402f00ed>] S 
> arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x29d/0x2e0
> (XEN)    [<ffff82d0402e03da>] S 
> arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x22a/0x490
> (XEN)    [<ffff82d0402f0fe2>] S 
> arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x312/0x7b0
> (XEN)    [<ffff82d0402f0c4e>] S 
> arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x3fe/0x480
> (XEN)    [<ffff82d0402f59aa>] S 
> arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x17a/0x600
> (XEN)    [<ffff82d0402f5ba0>] S 
> arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x370/0x600
> (XEN)    [<ffff82d0402f7c78>] S p2m_pod_demand_populate+0x6b8/0xa90
> (XEN)    [<ffff82d0402f0aa6>] S 
> arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x256/0x480
> (XEN)    [<ffff82d0402e9a1f>] S __get_gfn_type_access+0x6f/0x130
> (XEN)    [<ffff82d0402ab12b>] S hvm_hap_nested_page_fault+0xeb/0x760
> (XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164
> (XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164
> 
> The stack trace goes further, but I suspect the rest would be overkill.
> That seems to readily qualify as "Xen bug".
> 
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.