[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PVH dom0 construction timeout
On 28.02.2020 22:08, Andrew Cooper wrote: > It turns out that PVH dom0 construction doesn't work so well on a > 2-socket Rome system... > > (XEN) NX (Execute Disable) protection active > > (XEN) *** Building a PVH Dom0 *** > > (XEN) Watchdog timer detects that CPU0 is stuck! > > (XEN) ----[ Xen-4.14-unstable x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82d08029a8fd>] page_get_ram_type+0x58/0xb6 > > (XEN) RFLAGS: 0000000000000206 CONTEXT: hypervisor > > (XEN) rax: ffff82d080948fe0 rbx: 0000000002b73db9 rcx: 0000000000000000 > > (XEN) rdx: 0000000004000000 rsi: 0000000004000000 rdi: 0000002b73db9000 > > (XEN) rbp: ffff82d080827be0 rsp: ffff82d080827ba0 r8: ffff82d080948fcc > > (XEN) r9: 0000002b73dba000 r10: ffff82d0809491fc r11: 8000000000000000 > > (XEN) r12: 0000000002b73db9 r13: ffff8320341bc000 r14: 000000000404fc00 > > (XEN) r15: ffff82d08046f209 cr0: 000000008005003b cr4: 00000000001506e0 > > (XEN) cr3: 00000000a0414000 cr2: 0000000000000000 > > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen code around <ffff82d08029a8fd> (page_get_ram_type+0x58/0xb6): > > (XEN) 4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 > c0 > > (XEN) Xen stack trace from rsp=ffff82d080827ba0: > > (XEN) ffff82d08061ee91 ffff82d080827bb4 00000000000b2403 ffff82d080804340 > > (XEN) ffff8320341bc000 ffff82d080804340 ffff83000003df90 ffff8320341bc000 > > (XEN) ffff82d080827c08 ffff82d08061c38c ffff8320341bc000 ffff82d080827ca8 > > (XEN) ffff82d080648750 ffff82d080827c20 ffff82d08061852c 0000000000200000 > > (XEN) ffff82d080827d60 ffff82d080638abe ffff82d080232854 ffff82d080930c60 > > (XEN) ffff82d080930280 ffff82d080674800 ffff83000003df90 0000000001a40000 > > (XEN) ffff83000003df80 ffff82d080827c80 0000000000000206 ffff8320341bc000 > > (XEN) ffff82d080827cb8 ffff82d080827ca8 ffff82d080232854 ffff82d080961780 > > (XEN) ffff82d080930280 ffff82d080827c00 0000000000000002 ffff82d08022f9a0 > > (XEN) 00000000010a4bb0 ffff82d080827ce0 0000000000000206 000000000381b66d > > (XEN) ffff82d080827d00 ffff82d0802b1e87 ffff82d080936900 ffff82d080936900 > > (XEN) ffff82d080827d18 ffff82d0802b30d0 ffff82d080936900 ffff82d080827d50 > > (XEN) ffff82d08022ef5e ffff8320341bc000 ffff83000003df80 ffff8320341bc000 > > (XEN) ffff83000003df80 0000000001a40000 ffff83000003df90 ffff82d080674800 > > (XEN) ffff82d080827d98 ffff82d08063cd06 0000000000000001 ffff82d080674800 > > (XEN) ffff82d080931050 0000000000000100 ffff82d080950c80 ffff82d080827ee8 > > (XEN) ffff82d08062eae7 0000000001a40fff 0000000000000000 000ffff82d080e00 > > (XEN) ffffffff00000000 0000000000000005 0000000000000004 0000000000000004 > > (XEN) 0000000000000003 0000000000000003 0000000000000002 0000000000000002 > > (XEN) 0000000002050000 0000000000000000 ffff82d080674c20 ffff82d080674ea0 > > (XEN) Xen call trace: > > (XEN) [<ffff82d08029a8fd>] R page_get_ram_type+0x58/0xb6 > > (XEN) [<ffff82d08061ee91>] S arch_iommu_hwdom_init+0x239/0x2b7 > > (XEN) [<ffff82d08061c38c>] F > drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f > > (XEN) [<ffff82d08061852c>] F iommu_hwdom_init+0x44/0x4b > > (XEN) [<ffff82d080638abe>] F dom0_construct_pvh+0x160/0x1233 > > (XEN) [<ffff82d08063cd06>] F construct_dom0+0x5c/0x280e > > (XEN) [<ffff82d08062eae7>] F __start_xen+0x25db/0x2860 > > (XEN) [<ffff82d0802000ec>] F __high_start+0x4c/0x4e > > (XEN) > > (XEN) CPU1 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > (XEN) CPU31 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > (XEN) CPU30 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > (XEN) CPU27 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b) > > (XEN) CPU26 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > (XEN) CPU244 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > (XEN) CPU245 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b) > > (XEN) CPU247 @ e008:ffff82d080256e3f > (drivers/char/ns16550.c#ns_read_reg+0x2d/0x35) > > (XEN) CPU246 @ e008:ffff82d0802f203f > (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) > > <snip rather a large number of cpus, all idle> > > > This stack trace is the same on several boots, and in particular, > page_get_ram_type() being the %rip which took the timeout. For an > equivalent PV dom0 build, it takes perceptibly 0 time, based on how > quickly the next line is printed. > > I haven't diagnosed the exact issue, but some observations: > > The arch_iommu_hwdom_init() loop's positioning of > process_pending_softirqs() looks problematic, because it is short > circuited conditionally by hwdom_iommu_map(). Yes, we want to avoid this bypassing. I'll make a patch. > page_get_ram_type() is definitely suboptimal here. We have an linear > search over a (large-ish) sorted list, and a caller which has every MFN > in the system passed into it, which makes the total runtime of > arch_iommu_hwdom_init() quadratic with the size of the system. This linear search is the same for PVH and PV, isn't it? In fact hwdom_iommu_map(), on the average, may do more work for PV than for PVH, considering the is_hvm_domain()-based return from the switch()'s default case. So for the moment I could explain such a huge difference in consumed time only if the PV case ran with iommu_hwdom_passthrough set to true (which isn't possible for PVH). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |