[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] long latency of domain shutdown
>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 28.04.08 15:59 >>> >This was addressed by xen-unstable:15821. The fix is present in releases >since 3.2.0. It was never backported to 3.1 branch. > >There are a few changesets related to 15821 that you would also want to take >into your tree. For example, 15838 is a bugfix. And there is also a change >on the tools side that is required because domain_destroy can now return >-EAGAIN if it gets preempted. Any others will probably become obvious when >you try to backport 15821. > > -- Keir Okay, thanks - so I indeed missed the call to hypercall_preempt_check() in relinquish_memory(), which is the key indicator here. However, that change deals exclusively with domain shutdown, but not with the more general page table pinning/unpinning operations, which I believe are (as described) vulnerable to mis-use by a malicious guest (I realize that well behaved guests would not normally present a heavily populated address space here, but it also cannot be entirely excluded) - the upper bound to the number of operations on x86-64 is 512**4 or 2**36 l1 table entries (ignoring the hypervisor hole which doesn't need processing). Jan On 28/4/08 14:45, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote: > In (3.0.4-based) SLE10 SP1 we are currently dealing with a (reproducible) > report of time getting screwed up during domain shutdown. Debugging > revealed that the PM timer misses at least one overflow (i.e. platform > time lost about 4 seconds), which subsequently leads to disastrous > effects. > > Apart from tracking the time calibration, as the (currently) last step of > narrowing the cause I now made the first processor detecting severe > anomalies in time flow send an IPI to CPU0 (which is exclusively > responsible for managing platform time), which appears to prove that > this CPU is indeed busy processing a domain_kill() request, and namely > is in the process of tearing down the address spaces of the guest. > > Obviously, the hypervisor's behavior should not depend on the amount > of time needed to free a dead domain's resources, but the way it is > coded (and from doing some code comparison I would conclude that > while the code has significantly changed, the base characteristic of > domain shutdown being executed synchronously on the CPU requesting > so doesn't appear to have changed - of course, history shows that I > may easily overlook something here), and if that CPU happens to be > CPU0 the whole system will suffer due to the asymmetry of platform > time handling. > > If I'm indeed not overlooking an important fix in that area, what would > be considered a reasonable solution to this? I can imagine (in order of > my preference) > > - inserting calls to do_softirq() in the put_page_and_type() call > hierarchy (e.g. in alloc_l2_table() or even alloc_l1_table(), to > guarantee uniform behavior across sub-architectures; this might help > address other issues as the same scenario might happen when a > page table hierarchy gets destroyed at times other than domain > shutdown); perhaps the same might then also be needed in the > get_page_type() hierarchy, e.g. in free_l{2,1}_table() > > - simply doing round-robin responsibility of platform time among all > CPUs (would leave the unlikely UP case as still affected by the problem) > > - detecting platform timer overflow (and properly estimating how many > times it has overflowed) and sync-ing platform time back from local time > (as indicated in a comment somewhere) > > - marshalling the whole operation to another CPU > > For reference, this is the CPU0 backtrace I'm getting from the IPI: > > (XEN) *** Dumping CPU0 host state: *** > (XEN) State at keyhandler.c:109 > (XEN) ----[ Xen-3.0.4_13138-0.63 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff83000010e8a2>] dump_execstate+0x62/0xe0 > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 000000000013dd62 > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff8300002b2142 > (XEN) rbp: 0000000000000000 rsp: ffff8300001d3a30 r8: 0000000000000001 > (XEN) r9: 0000000000000001 r10: 00000000fffffffc r11: 0000000000000001 > (XEN) r12: 0000000000000001 r13: 0000000000000001 r14: 0000000000000001 > (XEN) r15: cccccccccccccccd cr0: 0000000080050033 cr4: 00000000000006f0 > (XEN) cr3: 000000000ce02000 cr2: 00002b47f8871ca8 > (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff8300001d3a30: > (XEN) 0000000000000046 ffff830000f7e280 ffff8300002b0e00 ffff830000f7e280 > (XEN) ffff83000013b665 0000000000000000 ffff83000012dc8a cccccccccccccccd > (XEN) 0000000000000001 0000000000000001 0000000000000001 ffff830000f7e280 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) ffff8284008f7aa0 ffff8284008f7ac8 0000000000000000 0000000000000000 > (XEN) 0000000000039644 ffff8284008f7aa0 000000fb00000000 ffff83000011345d > (XEN) 000000000000e008 0000000000000246 ffff8300001d3b18 000000000000e010 > (XEN) ffff830000113348 ffff83000013327f 0000000000000000 ffff8284008f7aa0 > (XEN) ffff8307cc1b7288 ffff8307cc1b8000 ffff830000f7e280 00000000007cc315 > (XEN) ffff8284137e4498 ffff830000f7e280 ffff830000132c24 0000000020000001 > (XEN) 0000000020000000 ffff8284137e4498 00000000007cc315 ffff8284137e7b48 > (XEN) ffff830000132ec4 ffff8284137e4498 000000000000015d ffff830000f7e280 > (XEN) ffff8300001328d2 ffff8307cc315ae8 ffff830000132cbb 0000000040000001 > (XEN) 0000000040000000 ffff8284137e7b48 ffff830000f7e280 ffff8284137f6be8 > (XEN) ffff830000132ec4 ffff8284137e7b48 00000000007cc919 ffff8307cc91a000 > (XEN) ffff8300001331a2 ffff8307cc919018 ffff830000132d41 0000000060000001 > (XEN) 0000000060000000 ffff8284137f6be8 0000000000006ea6 ffff8284001149f0 > (XEN) ffff830000132ec4 ffff8284137f6be8 0000000000000110 ffff830000f7e280 > (XEN) ffff830000133132 ffff830006ea6880 ffff830000132df0 0000000080000001 > (XEN) 0000000080000000 ffff8284001149f0 ffff8284001149f0 ffff8284001149f0 > (XEN) Xen call trace: > (XEN) [<ffff83000010e8a2>] dump_execstate+0x62/0xe0 > (XEN) [<ffff83000013b665>] smp_call_function_interrupt+0x55/0xa0 > (XEN) [<ffff83000012dc8a>] call_function_interrupt+0x2a/0x30 > (XEN) [<ffff83000011345d>] free_domheap_pages+0x2bd/0x3b0 > (XEN) [<ffff830000113348>] free_domheap_pages+0x1a8/0x3b0 > (XEN) [<ffff83000013327f>] put_page_from_l1e+0x9f/0x120 > (XEN) [<ffff830000132c24>] free_page_type+0x314/0x540 > (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 > (XEN) [<ffff8300001328d2>] put_page_from_l2e+0x32/0x70 > (XEN) [<ffff830000132cbb>] free_page_type+0x3ab/0x540 > (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 > (XEN) [<ffff8300001331a2>] put_page_from_l3e+0x32/0x70 > (XEN) [<ffff830000132d41>] free_page_type+0x431/0x540 > (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 > (XEN) [<ffff830000133132>] put_page_from_l4e+0x32/0x70 > (XEN) [<ffff830000132df0>] free_page_type+0x4e0/0x540 > (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0 > (XEN) [<ffff83000012923a>] relinquish_memory+0x17a/0x290 > (XEN) [<ffff830000183665>] identify_cpu+0x5/0x1f0 > (XEN) [<ffff830000117f10>] vcpu_runstate_get+0xb0/0xf0 > (XEN) [<ffff8300001296aa>] domain_relinquish_resources+0x35a/0x3b0 > (XEN) [<ffff8300001083e8>] domain_kill+0x28/0x60 > (XEN) [<ffff830000107560>] do_domctl+0x690/0xe60 > (XEN) [<ffff830000121def>] __putstr+0x1f/0x70 > (XEN) [<ffff830000138016>] mod_l1_entry+0x636/0x670 > (XEN) [<ffff830000118143>] schedule+0x1f3/0x270 > (XEN) [<ffff830000175ca6>] toggle_guest_mode+0x126/0x140 > (XEN) [<ffff830000175fa8>] do_iret+0xa8/0x1c0 > (XEN) [<ffff830000173b32>] syscall_enter+0x62/0x67 > > Jan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |