[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation
On 05/03/2016 06:13 PM, Jan Beulich wrote: >>>> On 03.05.16 at 16:41, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >> On 05/03/2016 05:30 PM, Jan Beulich wrote: >>>>>> On 03.05.16 at 16:20, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>>> I've kept experimenting with the patch but can't quite figure out why >>>> minimizing the lock scope to the writeback part would not be sufficient, >>>> but it isn't. >>>> >>>> I.e. with this code: >>>> >>>> 3824 writeback: >>>> 3825 ops->smp_lock(lock_prefix); >>>> 3826 switch ( dst.type ) >>>> 3827 { >>>> 3828 case OP_REG: >>>> 3829 /* The 4-byte case *is* correct: in 64-bit mode we >>>> zero-extend. */ >>>> 3830 switch ( dst.bytes ) >>>> 3831 { >>>> 3832 case 1: *(uint8_t *)dst.reg = (uint8_t)dst.val; break; >>>> 3833 case 2: *(uint16_t *)dst.reg = (uint16_t)dst.val; break; >>>> 3834 case 4: *dst.reg = (uint32_t)dst.val; break; /* 64b: zero-ext >>>> */ >>>> 3835 case 8: *dst.reg = dst.val; break; >>>> 3836 } >>>> 3837 break; >>>> 3838 case OP_MEM: >>>> 3839 if ( !(d & Mov) && (dst.orig_val == dst.val) && >>>> 3840 !ctxt->force_writeback ) >>>> 3841 /* nothing to do */; >>>> 3842 else if ( lock_prefix ) >>>> 3843 rc = ops->cmpxchg( >>>> 3844 dst.mem.seg, dst.mem.off, &dst.orig_val, >>>> 3845 &dst.val, dst.bytes, ctxt); >>>> 3846 else >>>> 3847 rc = ops->write( >>>> 3848 dst.mem.seg, dst.mem.off, &dst.val, dst.bytes, ctxt); >>>> 3849 if ( rc != 0 ) >>>> 3850 { >>>> 3851 ops->smp_unlock(lock_prefix); >>>> 3852 goto done; >>>> 3853 } >>>> 3854 default: >>>> 3855 break; >>>> 3856 } >>>> 3857 ops->smp_unlock(lock_prefix); >>>> >>>> I can still reproduce the guest hang. But if I lock at the very >>>> beginning of x86_emulate() and unlock before each return, no more hangs. >>> >>> Isn't that obvious? Locked instructions are necessarily >>> read-modify-write ones, and hence the lock needs to be taken >>> before the read, and dropped after the write. But remember, I'll >>> continue to show opposition to this getting "fixed" this way (in the >>> emulator itself), as long as no proper explanation can be given >>> why making hvmemul_cmpxchg() do what its name says isn't all >>> we need (and hence why i386-like bus lock behavior is needed). >> >> Yes, that's what I thought, but at a previous time I've described my >> attempt to lock _only_ hvmemul_cmpxchg() (which failed) - and the >> failure of that change to address the issue has been considered curious. >> I've probably not been able to explain clearly what I've tried, or have >> misunderstood the answer, and took it to mean that for some reason a >> similar change is supposed to be able to fix it. >> >> Obviously locking hvmemul_cmpxchg() would have only affected the OP_MEM >> case above (with lock_prefix == 1), actually an even smaller scope than >> the new one, and with no read locking either. > > Yeah, I may have got confused there too (resulting in me confusing > you, perhaps): Adding a spin lock to hvmemul_cmpxchg() of course > won't help. Making it use (or force us of) LOCK CMPXCHG would, I > suppose. > >> I guess the question now is what avenues are there to make >> hvmemul_cmpxchg() do what its name says - I'm certainly open to trying >> out any alternatives - my main concern is to have the problem fixed in >> the best way possible, certainly not to have any specific version of >> this patch make it into Xen. > > I guess making it no longer call hvmemul_write(), copying most of > that function's code while suitably replacing just the call to > hvm_copy_to_guest_virt() (with the assumption that the MMIO > paths at least for now don't need strict LOCK handling) is the only > viable route. Thank you for clearing that up! But while implementing a stub that falls back to the actual LOCK CMPXCHG and replacing hvm_copy_to_guest_virt() with it would indeed be an improvement (with the added advantage of being able to treat non-emulated LOCK CMPXCHG cases), I don't understand how that would solve the read-modify-write atomicity problem. AFAICT, this would only solve the write problem. Assuming we have VCPU1 and VCPU2 emulating a LOCKed instruction expecting rmw atomicity, the stub alone would not prevent this: VCPU1: read, modify VCPU2: read, modify, write VCPU1: write Moreover, since reads and writes are not synchronized, it would be possible for VCPU2's read to occur while VCPU1 writes, and VCPU1 could read part of the old data + part of the new data. So the problem originally addressed by the patch would still need to be addressed like that: with a read / write lock covering all the relevant parts of x86_emulate(). Unless I'm mistaken, the stub part is only needed to make sure that CMPXCHG alone does not race when a VCPU emulates and another does not. Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |