|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] VMX: fix VMCS race on context-switch paths
On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote:
> > > > On 15.02.17 at 11:27, <sergey.dyasli@xxxxxxxxxx> wrote:
> >
> > This is what I'm getting during the original test case (32 VMs reboot):
> >
> > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck!
> > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted
> > ]----
> > (XEN) [ 1407.803774] CPU: 12
> > (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>]
> > vmx_vmcs_reload+0x32/0x50
> > (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0)
> > (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx:
> > 0000000000000000
> > (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi:
> > ffff830079ee7000
> > (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8:
> > 000001468e9e3ccc
> > (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11:
> > ffff833a9558c310
> > (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14:
> > ffff83403ffe7000
> > (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4:
> > 00000000003526e0
> > (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150
> > (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000
> > cs: e008
> > (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2>
> > (vmx_vmcs_reload+0x32/0x50):
> > (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5
> > e9 a0 fa ff ff f3 c3
> > (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8:
> > (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000
> > 0000000000000000
> > (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286
> > 000000000000000c
> > (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000
> > 00000146a7f26495
> > (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000
> > ffff82d080364640
> > (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148
> > ffff83403ffe7000
> > (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305
> > 000000000000000c
> > (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400
> > 000000440000011d
> > (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000
> > ffff82d080348d00
> > (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c
> > ffff830079ee7000
> > (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000
> > 00000000ffffffff
> > (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001
> > fffffa80077f9700
> > (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000
> > 0000000000000000
> > (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78
> > 0000000000000000
> > (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848
> > fffff800027f9e80
> > (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2
> > 0000000000000000
> > (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000
> > 80248e00e008e1f0
> > (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0
> > 80248e000000000c
> > (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0
> > (XEN) [ 1408.075638] Xen call trace:
> > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50
> > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0
> > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0
> > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0
> > (XEN) [ 1408.107925] [<ffff82d080136400>]
> > timer.c#timer_softirq_action+0x90/0x210
> > (XEN) [ 1408.116263] [<ffff82d08013311c>]
> > softirq.c#__do_softirq+0x5c/0x90
> > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60
>
> Taking your later reply into account - were you able to figure out
> what other party held onto the VMCS being waited for here?
Unfortunately, no. It was unclear from debug logs. But judging from
the following vmx_do_resume() code:
if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
{
if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
vmx_load_vmcs(v);
}
If both of the above conditions are true then vmx_vmcs_reload() will
probably hang.
--
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |