Xen project Mailing List

Re: [Xen-devel] [BUG] mm locking order violation when HVM guest changes graphics mode on virtual graphics adapter.

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>

From: Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>

Date: Fri, 1 Nov 2013 12:05:53 -0400

Cc: Austin S Hemmelgarn <ahferroin7@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Fri, 01 Nov 2013 16:06:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Oct 31, 2013, at 12:27 PM, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx> wrote: >> Reliably reproducible, occurs when HVM guest changes graphics mode >> virtual graphics adapter on Xen 4.3.0 from Gentoo. >> >> To reproduce: Using Xen 4.3.0 from Gentoo Portage Tree, and the >> corresponding version of Xl, both built with GCC 4.7.3 with HVM and >> qemu-dm support built in: >> 1. Boot using a Gentoo Linux Dom0 with kernel version 3.10.7-r1 built >> with the kernel config found at http://pastebin.com/GxDpPsk3. >> 2. Get a copy of the fedora i686 network install CD. >> 2. Start a HVM domain with a configuration like the one found at >> http://pastebin.com/p0wxnaTg. >> 3. After connecting to the VNC console, start the install process. >> 4. When Anaconda tries to start the graphical environment, causing the >> kernel to change the graphics mode from the current setting, xen will >> crash with a call to BUG() in mm.h at line 118. >> >> Xen log can be found at http://pastebin.com/zKCJsp21. >> xl info output can be found at http://pastebin.com/NqtksS18. >> lspci -vvv output can be found at http://pastebin.com/Ja97Cx42. >> xenstore contents can be found at http://pastebin.com/aL9vpxwu. >> >> I'll be happy to provide any other information you may need upon request. > > Thanks for the report. > > From what I can glean you are using AMD NPT, can you confirm? > > So the trigger is that you are using both PoD and nested virt. To elaborate: > - Setting maxmem to 2G and men to 512M uses the PoD (populate on demand > subsystem) to account for the 1.5GB of extra wiggle room. Please make sure > you have a guest balloon that will be able to deal with the guest trying to > use over 512M. > - You have nestedhvm=1. Do you really need this? > > Changing either (memory == maxmem or nestedhvm=0) will remove the problem and > allow you to make progress. > > There is a real bug, however, that needs to be fixed here. At some point in > the 4.3 cycle the flushing of the nested p2m table was added, and it would > seem to be relinquishing the p2m lock: George, Tim, Paging you in for a bit more insight. The bug is as follows 1. pod allocate zero page 2. an intermediate level in the p2m needs to be allocated 3. thus nested p2m needs to be flushed 4. attempts to grab p2m lock on the nested p2m 5. explode This is because the current level of locking is pod lock which exceeds the p2m level lock. Some solutions: 1. defer flushing of nested p2m until we are done with the fault and have unrolled enough stack 2. have the nested p2m locks look "different" to the lock ordering machinery I think 2 is tricky because there are paths in which there is no reason for them to look any different, like the regular hap nested fault handler (nestedhvm_hap_nested_page_fault). Really the only part where this blows up is in the flushing of the nested tables (all one of them?!) So I propose defering, or some other cunning idea. Thanks Andres > __get_gfn_type_access -> grab p2m lock > p2m_pod_demand_populate -> grab pod lock > p2m_next_level -> still holding p2m lock > then drops it > p2m_flush_table -> grabs p2m lock -> KAPOW > > Andres > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.