[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
Hi,Facing a issue where bootstorm of guests leads to host crash. I debugged and found that that enabling PML introduces a race condition during guest teardown stage while disabling PML on a vcpu and context switch happening for another vcpu. Crash happens only on Broadwell processors as PML got introduced in this generation. Here is my analysis: Race condition:vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( disable_PML); vmx_vmcx_exit();) In between I have a callpath from another pcpu executing context switch-> vmx_fpu_leave() and crash on vmwrite.. if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) { v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash } Debug logs XEN) [221256.749928] VMWRITE VMCS Invalid !!!!!(XEN) [221256.754870] **[00] { now 0000c93b4341df1d, hw 00000035fffea000, op 00000035fffea000 } vmclear (XEN) [221256.765052] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.773969] **[01] { now 0000c93b4341e099, hw ffffffffffffffff, op 00000035fffea000 } vmptrld (XEN) [221256.784150] ** frames [ ffff82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.792197] **[02] { now 0000c93b4341e1f1, hw 00000035fffea000, op 00000035fffea000 } vmclear (XEN) [221256.802378] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.811298] **[03] { now 0000c93b5784dd0a, hw ffffffffffffffff, op 00000039d7074000 } vmptrld (XEN) [221256.821478] ** frames [ ffff82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.829139] **[04] { now 0000c93b59d67b5b, hw 00000039d7074000, op 0000002b9a575000 } vmptrld (XEN) [221256.839320] ** frames [ ffff82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.882850] **[07] { now 0000c93b59e71e48, hw 0000002b9a575000, op 00000039d7074000 } vmptrld (XEN) [221256.893034] ** frames [ ffff82d0801f4d13 vmx_do_resume+0x133/0x150 ] (XEN) [221256.900790] **[08] { now 0000c93b59e78675, hw 00000039d7074000, op 00000040077ae000 } vmptrld (XEN) [221256.910968] ** frames [ ffff82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.919015] **[09] { now 0000c93b59e78ac8, hw 00000040077ae000, op 00000040077ae000 } vmclear (XEN) [221256.929196] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.938117] **[10] { now 0000c93b59e78d72, hw ffffffffffffffff, op 00000040077ae000 } vmptrld (XEN) [221256.948297] ** frames [ ffff82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.956345] **[11] { now 0000c93b59e78ff0, hw 00000040077ae000, op 00000040077ae000 } vmclear (XEN) [221256.966525] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.975445] **[12] { now 0000c93b59e7deda, hw ffffffffffffffff, op 00000040077b3000 } vmptrld (XEN) [221256.985626] ** frames [ ffff82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.993672] **[13] { now 0000c93b59e9fe00, hw 00000040077b3000, op 00000040077b3000 } vmclear (XEN) [221257.003852] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.012772] **[14] { now 0000c93b59ea007e, hw ffffffffffffffff, op 00000040077b3000 } vmptrld (XEN) [221257.022952] ** frames [ ffff82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221257.031000] **[15] { now 0000c93b59ea02ba, hw 00000040077b3000, op 00000040077b3000 } vmclear (XEN) [221257.041180] ** frames [ ffff82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.050101] .... (XEN) [221257.053008] vmcs_ptr:0xffffffffffffffff, vcpu->vmcs:0x2b9a575000vmcs is loaded and between the next call to vm_write, there is a clear of vmcs caused by vmx_vcpu_disable_pml. Above log highlights that IPI is clearing the vmcs in between vmptrld and vmwrite but I also verified that interrupts are disabled during context switch and execution of vm_write in vmx_fpu_leave.. This has got me confused. Also, I am not sure if I understand the handling of foreign_vmcs correctly, which can also be the cause of the race. Please if you can share some suggestions here. Thanks Anshul Makkar _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |