[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault
Hi guys I have found the problem (after hours and hours of gruesome debugging with the almighty print) and it seems that this could potentially have quite a bit of impact if altp2m is enabled for a guest domain (even if the functionality is never actively used), since destroying any vcpu of this guest could lead to a hypervisor panic. So a malicious user could simply destroy and restart his VM(s) in order to DOS the VMs of other users by killing the hypervisor. Granted, this is not very effective, but, depending on the environment, it is extremely easy to implement. The bug persists in Xen 4.7 and I do not that it was fixed in the current master branch. The following happens. The call void hvm_vcpu_destroy(struct vcpu *v) { hvm_all_ioreq_servers_remove_vcpu(v->domain, v); if ( hvm_altp2m_supported() ) altp2m_vcpu_destroy(v); at some time reaches vmx_vcpu_update_eptp which ends with a vmx_vmcs_exit(v);. There vmx_clear_vmcs(v); -> __vmx_clear_vmcs is called where the current_vmcs is invalidated if the current vmcs in the CPU is the same as virt_to_maddr (v->arch.hvm_vmx->vmcs): __vmpclear(virt_to_maddr(arch_vmx->vmcs)); ( http://www.tptp.cc/mirrors/siyobik.info/instruction/VMCLEAR.html ) To check this assumption I implemented a basic __vmptrst ( http://www.tptp.cc/mirrors/siyobik.info/instruction/VMPTRST.html ) and added the result to the debug output. (XEN) vmcs.c:519:IDLEv4 __vmx_clear_vmcs: realVMCS BEFORE __vmpclear 82415a000 (XEN) vmcs.c:522:IDLEv4 __vmx_clear_vmcs: realVMCS AFTER __vmpclear ffffffffffffffff After that no vmcs_load / enter is executed so the vmcs in the CPU remains invalidated. For the next function in hvm_vcpu_destroy, the nestedhvm_vcpu_destroy(v) the missing vmcs is no problem (at least in our use case), but the free_compat_arg_xlat crashes. The callstack is as follows: hvm_vcpu_destroy free_compat_arg_xlat destroy_perdomain_mapping map_domain_page (probably inlined) mapcache_current_vcpu sync_local_execstate __sync_local_execstate __context_switch (with function pointer v->arch.ctxt_switch_from = vmx_ctxt_switch_from) vmx_ctxt_switch_from (probably inlined) vmx_fpu_leave There a vmwrite is tried if either ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) or ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_TS) ) is true. The executed vmwrite then crashes. As my knowledge of Xen is not that comprehensive, could you tell me when the TS-bits are set / cleared and what they are used for? static void vmx_fpu_leave(struct vcpu *v) { ASSERT(!v->fpu_dirtied); ASSERT(read_cr0() & X86_CR0_TS); if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) { v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); } if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_TS) ) { v->arch.hvm_vcpu.hw_cr[0] |= X86_CR0_TS; __vmwrite(GUEST_CR0, v->arch.hvm_vcpu.hw_cr[0]); v->arch.hvm_vmx.exception_bitmap |= (1u << TRAP_no_device); vmx_update_exception_bitmap(v); } } In the crash dump the additional debug output shows that at least one __vmwrite will be tried and that the VMCS in the CPU is invalidated: (XEN) vmx.c:698:IDLEv4 vmx_fpu_leave: vcpu ffff8300defae000 vmcs ffff8301586c9000 host_cr0-case FALSE guest_cr[0]-case TRUE curr ffff8300df2fb000 curr->arch.hvm_vmx.vmcs 0000000000000000 realVMCS ffffffffffffffff As a quick fix I patched the fpu_leave to only allow the __vmwrite when the realVMCS is valid. This seems to work fine, but requires a call to __vmptrst every time vmx_fpu_leave is called. Also I do not know if an ignored TS has any negative consequences when destroying a vcpu. I assume that this is not case. In our tests nothing pointed to any problems. I added the patch to enable altp2m unconditionally and a patch which evades the panic in vmx_fpu_leave. They are not pretty or anywhere near production ready, but I think you will get the idea. I tried to implement the __vmptrst with the #ifdef HAVE_GAS_VM parts ( analogue to the other functions in vmx.h ) but failed miserably since I lack the required knowledge about the OPCODE definitions. :-D Cheers Kevin > -----Ursprüngliche Nachricht----- > Von: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Gesendet: Montag, 22. August 2016 13:58 > An: Mayer, Kevin <Kevin.Mayer@xxxxxxxx>; JBeulich@xxxxxxxx > Cc: xen-devel@xxxxxxxxxxxxx > Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabledbydefault > > On 19/08/16 11:01, Kevin.Mayer@xxxxxxxx wrote: > > Hi > > > > I took another look at Xen and a new crashdump. > > The last successful __vmwrite should be in static void > > vmx_vcpu_update_vmfunc_ve(struct vcpu *v) [...] > > __vmwrite(SECONDARY_VM_EXEC_CONTROL, > > v->arch.hvm_vmx.secondary_exec_control); > > [...] > > After this the altp2m_vcpu_destroy wakes up the vcpu and is then > finished. > > > > In nestedhvm_vcpu_destroy (nvmx_vcpu_destroy) the vmcs can > overwritten (but is not reached in our case as far as I can see): > > if ( nvcpu->nv_n1vmcx ) > > v->arch.hvm_vmx.vmcs = nvcpu->nv_n1vmcx; > > > > In conclusion: > > When destroying a domain the altp2m_vcpu_destroy(v); path seems to > mess up the vmcs which ( only ) sometimes leads to a failed __vmwrite in > vmx_fpu_leave. > > That is as far as I can get with my understanding of the Xen code. > > > > Do you guys have any additional ideas what I could test / analyse? > > Do you have easy reproduction instructions you could share? Sadly, this is > looking like an issue which isn't viable to debug over email. > > ~Andrew ____________ Virus checked by G Data MailSecurity Version: AVA 25.8368 dated 21.09.2016 Virus news: www.antiviruslab.com Attachment:
xen-altp2menable.patch Attachment:
xen-vmx_altp2m_bug.patch Attachment:
smime.p7s _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |