[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG
The real CMPXCHG hvmemul_cmpxchg() implementation works as expected as far the race conditions go, and returning RETRY for failed writebacks seems to work without issue for regular Xen emulation. However, when the patch is used with introspection, I've had a BCCode: 101 BSOD and rare (but several) occasions when the guest becomes unresponsive (can't close Firefox or have the Windows start menu show up when clicking the "Start" button, but the guest is otherwise running). This I believe is due to the do { } while ( rc == X86EMUL_RETRY ); loop in hvm_emulate_one_vm_event(): I am basically now looping behing the guest's back until the CMPXCHG succeeds, which can theoretically be a very long time to execute a CMPXCHG instruction, and most likely not what the guest OS is expecting. The alternative (and the current default) is to do nothing on X86EMUL_RETRY and just allow the guest to re-enter in the same place, which should trigger the same page fault vm_event, which can hopefully now be able to emulate the current instruction. However, in the best case scenario, this just complicates the above loop since the current instruction will still be unable to complete until emulation succeeds but this time with VMEXITs. And in the worst case scenario (which is what happens in my tests) this adds an additional factor of unpredictability, since the guest quickly BSODs (or rarely just plain hangs). I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a failed CMPXCHG should happen just once, with the proper registers and ZF set. The guest surely expects neither that the instruction resume until it succeeds, nor that some hidden loop goes on for an undeterminate ammount of time until a CMPXCHG succeeds. The picture is further complicated by the two-part handling of instructions in x86_emulate(). For example, for CMPXCHG we have: case X86EMUL_OPC(0x0f, 0xb0): case X86EMUL_OPC(0x0f, 0xb1): /* cmpxchg */ /* Save real source value, then compare EAX against destination. */ src.orig_val = src.val; src.val = _regs.r(ax); /* cmp: %%eax - dst ==> dst and src swapped for macro invocation */ emulate_2op_SrcV("cmp", dst, src, _regs.eflags); if ( _regs.eflags & X86_EFLAGS_ZF ) { /* Success: write back to memory. */ dst.val = src.orig_val; } else { /* Failure: write the value we saw to EAX. */ dst.type = OP_REG; dst.reg = (unsigned long *)&_regs.r(ax); } break; This is the only part that sets the proper registers and ZF, and it's only the comparison. If this succeeds, x86_emulate() currently assumes that the writeback part cannot fail. But then the writeback part is: case OP_MEM: if ( !(d & Mov) && (dst.orig_val == dst.val) && !ctxt->force_writeback ) /* nothing to do */; else if ( lock_prefix ) { fail_if(!ops->cmpxchg); rc = ops->cmpxchg( dst.mem.seg, dst.mem.off, &dst.orig_val, &dst.val, dst.bytes, ctxt); } else { fail_if(!ops->write); rc = ops->write(dst.mem.seg, dst.mem.off, !state->simd_size ? &dst.val : (void *)mmvalp, dst.bytes, ctxt); if ( sfence ) asm volatile ( "sfence" ::: "memory" ); } if ( rc != 0 ) goto done; default: break; I now see that this is why using a spinlock only around the writeback part did not solve the issue: both the compare part and the writeback part need to be part of the same atomic operation - lock needs to be aquired before the cmp / ZF part. Opinions and suggestions are appreciated. If I'm not mistaken, it looks like the smp_lock design is the better solution to the problem. Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |