[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] long latency of domain shutdown
In (3.0.4-based) SLE10 SP1 we are currently dealing with a (reproducible) report of time getting screwed up during domain shutdown. Debugging revealed that the PM timer misses at least one overflow (i.e. platform time lost about 4 seconds), which subsequently leads to disastrous effects. Apart from tracking the time calibration, as the (currently) last step of narrowing the cause I now made the first processor detecting severe anomalies in time flow send an IPI to CPU0 (which is exclusively responsible for managing platform time), which appears to prove that this CPU is indeed busy processing a domain_kill() request, and namely is in the process of tearing down the address spaces of the guest. Obviously, the hypervisor's behavior should not depend on the amount of time needed to free a dead domain's resources, but the way it is coded (and from doing some code comparison I would conclude that while the code has significantly changed, the base characteristic of domain shutdown being executed synchronously on the CPU requesting so doesn't appear to have changed - of course, history shows that I may easily overlook something here), and if that CPU happens to be CPU0 the whole system will suffer due to the asymmetry of platform time handling. If I'm indeed not overlooking an important fix in that area, what would be considered a reasonable solution to this? I can imagine (in order of my preference) - inserting calls to do_softirq() in the put_page_and_type() call hierarchy (e.g. in alloc_l2_table() or even alloc_l1_table(), to guarantee uniform behavior across sub-architectures; this might help address other issues as the same scenario might happen when a page table hierarchy gets destroyed at times other than domain shutdown); perhaps the same might then also be needed in the get_page_type() hierarchy, e.g. in free_l{2,1}_table() - simply doing round-robin responsibility of platform time among all CPUs (would leave the unlikely UP case as still affected by the problem) - detecting platform timer overflow (and properly estimating how many times it has overflowed) and sync-ing platform time back from local time (as indicated in a comment somewhere) - marshalling the whole operation to another CPU For reference, this is the CPU0 backtrace I'm getting from the IPI: (XEN) *** Dumping CPU0 host state: *** (XEN) State at keyhandler.c:109 (XEN) ----[ Xen-3.0.4_13138-0.63 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff83000010e8a2>] dump_execstate+0x62/0xe0 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 000000000013dd62 (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff8300002b2142 (XEN) rbp: 0000000000000000 rsp: ffff8300001d3a30 r8: 0000000000000001 (XEN) r9: 0000000000000001 r10: 00000000fffffffc r11: 0000000000000001 (XEN) r12: 0000000000000001 r13: 0000000000000001 r14: 0000000000000001 (XEN) r15: cccccccccccccccd cr0: 0000000080050033 cr4: 00000000000006f0 (XEN) cr3: 000000000ce02000 cr2: 00002b47f8871ca8 (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff8300001d3a30: (XEN) 0000000000000046 ffff830000f7e280 ffff8300002b0e00 ffff830000f7e280 (XEN) ffff83000013b665 0000000000000000 ffff83000012dc8a cccccccccccccccd (XEN) 0000000000000001 0000000000000001 0000000000000001 ffff830000f7e280 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) ffff8284008f7aa0 ffff8284008f7ac8 0000000000000000 0000000000000000 (XEN) 0000000000039644 ffff8284008f7aa0 000000fb00000000 ffff83000011345d (XEN) 000000000000e008 0000000000000246 ffff8300001d3b18 000000000000e010 (XEN) ffff830000113348 ffff83000013327f 0000000000000000 ffff8284008f7aa0 (XEN) ffff8307cc1b7288 ffff8307cc1b8000 ffff830000f7e280 00000000007cc315 (XEN) ffff8284137e4498 ffff830000f7e280 ffff830000132c24 0000000020000001 (XEN) 0000000020000000 ffff8284137e4498 00000000007cc315 ffff8284137e7b48 (XEN) ffff830000132ec4 ffff8284137e4498 000000000000015d ffff830000f7e280 (XEN) ffff8300001328d2 ffff8307cc315ae8 ffff830000132cbb 0000000040000001 (XEN) 0000000040000000 ffff8284137e7b48 ffff830000f7e280 ffff8284137f6be8 (XEN) ffff830000132ec4 ffff8284137e7b48 00000000007cc919 ffff8307cc91a000 (XEN) ffff8300001331a2 ffff8307cc919018 ffff830000132d41 0000000060000001 (XEN) 0000000060000000 ffff8284137f6be8 0000000000006ea6 ffff8284001149f0 (XEN) ffff830000132ec4 ffff8284137f6be8 0000000000000110 ffff830000f7e280 (XEN) ffff830000133132 ffff830006ea6880 ffff830000132df0 0000000080000001 (XEN) 0000000080000000 ffff8284001149f0 ffff8284001149f0 ffff8284001149f0 (XEN) Xen call trace: (XEN) [<ffff83000010e8a2>] dump_execstate+0x62/0xe0 (XEN) [<ffff83000013b665>] smp_call_function_interrupt+0x55/0xa0 (XEN) [<ffff83000012dc8a>] call_function_interrupt+0x2a/0x30 (XEN) [<ffff83000011345d>] free_domheap_pages+0x2bd/0x3b0 (XEN) [<ffff830000113348>] free_domheap_pages+0x1a8/0x3b0 (XEN) [<ffff83000013327f>] put_page_from_l1e+0x9f/0x120 (XEN) [<ffff830000132c24>] free_page_type+0x314/0x540 (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 (XEN) [<ffff8300001328d2>] put_page_from_l2e+0x32/0x70 (XEN) [<ffff830000132cbb>] free_page_type+0x3ab/0x540 (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 (XEN) [<ffff8300001331a2>] put_page_from_l3e+0x32/0x70 (XEN) [<ffff830000132d41>] free_page_type+0x431/0x540 (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 (XEN) [<ffff830000133132>] put_page_from_l4e+0x32/0x70 (XEN) [<ffff830000132df0>] free_page_type+0x4e0/0x540 (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 (XEN) [<ffff83000012923a>] relinquish_memory+0x17a/0x290 (XEN) [<ffff830000183665>] identify_cpu+0x5/0x1f0 (XEN) [<ffff830000117f10>] vcpu_runstate_get+0xb0/0xf0 (XEN) [<ffff8300001296aa>] domain_relinquish_resources+0x35a/0x3b0 (XEN) [<ffff8300001083e8>] domain_kill+0x28/0x60 (XEN) [<ffff830000107560>] do_domctl+0x690/0xe60 (XEN) [<ffff830000121def>] __putstr+0x1f/0x70 (XEN) [<ffff830000138016>] mod_l1_entry+0x636/0x670 (XEN) [<ffff830000118143>] schedule+0x1f3/0x270 (XEN) [<ffff830000175ca6>] toggle_guest_mode+0x126/0x140 (XEN) [<ffff830000175fa8>] do_iret+0xa8/0x1c0 (XEN) [<ffff830000173b32>] syscall_enter+0x62/0x67 Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |