[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] long latency of domain shutdown



In (3.0.4-based) SLE10 SP1 we are currently dealing with a (reproducible)
report of time getting screwed up during domain shutdown. Debugging
revealed that the PM timer misses at least one overflow (i.e. platform
time lost about 4 seconds), which subsequently leads to disastrous
effects.

Apart from tracking the time calibration, as the (currently) last step of
narrowing the cause I now made the first processor detecting severe
anomalies in time flow send an IPI to CPU0 (which is exclusively
responsible for managing platform time), which appears to prove that
this CPU is indeed busy processing a domain_kill() request, and namely
is in the process of tearing down the address spaces of the guest.

Obviously, the hypervisor's behavior should not depend on the amount
of time needed to free a dead domain's resources, but the way it is
coded (and from doing some code comparison I would conclude that
while the code has significantly changed, the base characteristic of
domain shutdown being executed synchronously on the CPU requesting
so doesn't appear to have changed - of course, history shows that I
may easily overlook something here), and if that CPU happens to be
CPU0 the whole system will suffer due to the asymmetry of platform
time handling.

If I'm indeed not overlooking an important fix in that area, what would
be considered a reasonable solution to this? I can imagine (in order of
my preference)

- inserting calls to do_softirq() in the put_page_and_type() call
hierarchy (e.g. in alloc_l2_table() or even alloc_l1_table(), to
guarantee uniform behavior across sub-architectures; this might help
address other issues as the same scenario might happen when a
page table hierarchy gets destroyed at times other than domain
shutdown); perhaps the same might then also be needed in the
get_page_type() hierarchy, e.g. in free_l{2,1}_table()

- simply doing round-robin responsibility of platform time among all
CPUs (would leave the unlikely UP case as still affected by the problem)

- detecting platform timer overflow (and properly estimating how many
times it has overflowed) and sync-ing platform time back from local time
(as indicated in a comment somewhere)

- marshalling the whole operation to another CPU

For reference, this is the CPU0 backtrace I'm getting from the IPI:

(XEN) *** Dumping CPU0 host state: ***
(XEN) State at keyhandler.c:109
(XEN) ----[ Xen-3.0.4_13138-0.63  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff83000010e8a2>] dump_execstate+0x62/0xe0
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 000000000013dd62
(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff8300002b2142
(XEN) rbp: 0000000000000000   rsp: ffff8300001d3a30   r8:  0000000000000001
(XEN) r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
(XEN) r12: 0000000000000001   r13: 0000000000000001   r14: 0000000000000001
(XEN) r15: cccccccccccccccd   cr0: 0000000080050033   cr4: 00000000000006f0
(XEN) cr3: 000000000ce02000   cr2: 00002b47f8871ca8
(XEN) ds: 0000   es: 0000   fs: 0063   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8300001d3a30:
(XEN)    0000000000000046 ffff830000f7e280 ffff8300002b0e00 ffff830000f7e280
(XEN)    ffff83000013b665 0000000000000000 ffff83000012dc8a cccccccccccccccd
(XEN)    0000000000000001 0000000000000001 0000000000000001 ffff830000f7e280
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff8284008f7aa0 ffff8284008f7ac8 0000000000000000 0000000000000000
(XEN)    0000000000039644 ffff8284008f7aa0 000000fb00000000 ffff83000011345d
(XEN)    000000000000e008 0000000000000246 ffff8300001d3b18 000000000000e010
(XEN)    ffff830000113348 ffff83000013327f 0000000000000000 ffff8284008f7aa0
(XEN)    ffff8307cc1b7288 ffff8307cc1b8000 ffff830000f7e280 00000000007cc315
(XEN)    ffff8284137e4498 ffff830000f7e280 ffff830000132c24 0000000020000001
(XEN)    0000000020000000 ffff8284137e4498 00000000007cc315 ffff8284137e7b48
(XEN)    ffff830000132ec4 ffff8284137e4498 000000000000015d ffff830000f7e280
(XEN)    ffff8300001328d2 ffff8307cc315ae8 ffff830000132cbb 0000000040000001
(XEN)    0000000040000000 ffff8284137e7b48 ffff830000f7e280 ffff8284137f6be8
(XEN)    ffff830000132ec4 ffff8284137e7b48 00000000007cc919 ffff8307cc91a000
(XEN)    ffff8300001331a2 ffff8307cc919018 ffff830000132d41 0000000060000001
(XEN)    0000000060000000 ffff8284137f6be8 0000000000006ea6 ffff8284001149f0
(XEN)    ffff830000132ec4 ffff8284137f6be8 0000000000000110 ffff830000f7e280
(XEN)    ffff830000133132 ffff830006ea6880 ffff830000132df0 0000000080000001
(XEN)    0000000080000000 ffff8284001149f0 ffff8284001149f0 ffff8284001149f0
(XEN) Xen call trace:
(XEN)    [<ffff83000010e8a2>] dump_execstate+0x62/0xe0
(XEN)    [<ffff83000013b665>] smp_call_function_interrupt+0x55/0xa0
(XEN)    [<ffff83000012dc8a>] call_function_interrupt+0x2a/0x30
(XEN)    [<ffff83000011345d>] free_domheap_pages+0x2bd/0x3b0
(XEN)    [<ffff830000113348>] free_domheap_pages+0x1a8/0x3b0
(XEN)    [<ffff83000013327f>] put_page_from_l1e+0x9f/0x120
(XEN)    [<ffff830000132c24>] free_page_type+0x314/0x540
(XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
(XEN)    [<ffff8300001328d2>] put_page_from_l2e+0x32/0x70
(XEN)    [<ffff830000132cbb>] free_page_type+0x3ab/0x540
(XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
(XEN)    [<ffff8300001331a2>] put_page_from_l3e+0x32/0x70
(XEN)    [<ffff830000132d41>] free_page_type+0x431/0x540
(XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
(XEN)    [<ffff830000133132>] put_page_from_l4e+0x32/0x70
(XEN)    [<ffff830000132df0>] free_page_type+0x4e0/0x540
(XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
(XEN)    [<ffff83000012923a>] relinquish_memory+0x17a/0x290
(XEN)    [<ffff830000183665>] identify_cpu+0x5/0x1f0
(XEN)    [<ffff830000117f10>] vcpu_runstate_get+0xb0/0xf0
(XEN)    [<ffff8300001296aa>] domain_relinquish_resources+0x35a/0x3b0
(XEN)    [<ffff8300001083e8>] domain_kill+0x28/0x60
(XEN)    [<ffff830000107560>] do_domctl+0x690/0xe60
(XEN)    [<ffff830000121def>] __putstr+0x1f/0x70
(XEN)    [<ffff830000138016>] mod_l1_entry+0x636/0x670
(XEN)    [<ffff830000118143>] schedule+0x1f3/0x270
(XEN)    [<ffff830000175ca6>] toggle_guest_mode+0x126/0x140
(XEN)    [<ffff830000175fa8>] do_iret+0xa8/0x1c0
(XEN)    [<ffff830000173b32>] syscall_enter+0x62/0x67

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.