[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation
>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 05/05/16 11:24 AM >>> >On 05/04/2016 04:42 PM, Jan Beulich wrote: >>>>> On 04.05.16 at 13:32, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>> But while implementing a stub that falls back to the actual LOCK CMPXCHG >>> and replacing hvm_copy_to_guest_virt() with it would indeed be an >>> improvement (with the added advantage of being able to treat >>> non-emulated LOCK CMPXCHG cases), I don't understand how that would >>> solve the read-modify-write atomicity problem. >>> >>> AFAICT, this would only solve the write problem. Assuming we have VCPU1 >>> and VCPU2 emulating a LOCKed instruction expecting rmw atomicity, the >>> stub alone would not prevent this: >>> >>> VCPU1: read, modify >>> VCPU2: read, modify, write >>> VCPU1: write >> >> I'm not sure I follow what you mean here: Does the above represent >> what the guest does, or what the hypervisor does as steps to emulate >> a _single_ guest instruction? In the former case, I don't see what >> you're after. And in the latter case I don't understand why you think >> using CMPXCHG instead of WRITE wouldn't help. > >Briefly, this is the scenario: assuming a guest with two VCPUs and an >introspection application that has restricted access to a page, the >guest runs two LOCK instructions that touch the page, causing a page >fault for each instruction. This further translates to two EPT fault >vm_events being placed in the ring buffer. > >By the time the introspection application polls the event channel, both >VCPUs are paused, waiting for replies to the vm_events. > >If the monitoring application processes both events (puts both replies, >with the emulate option on, in the ring buffer), then signals the event >channel, it is possible that both VCPUs get woken up, ending up running >x86_emulate() simultaneously. > >In this case, my understanding is that just using CMPXCHG will not work >(although it is clearly superior to the current implementation), because >the read part and the write part of x86_emulate() (when LOCKed >instructions are involved) should be executed atomically, but writing >the CMPXCHG stub would only make sure that two simultaneous writes won't >occur. > >In other words, this would still be possible (atomicity would still not >be guaranteed for LOCKed instructions): > >VCPU1: read >VCPU2: read, write >VCPU1: write > >when what we want for LOCKed instructions is: > >VCPU1: read, write >VCPU2: read, write Okay, in short I take this to mean "single instruction" as answer to my actual question. >Am I misunderstanding how x86_emulate() works? No, but I suppose you're misunderstanding what I'm trying to suggest. What you write above is not what will result when using CMPXCHG. Instead what we'll have is vCPU1: read vCPU2: read vCPU2: cmpxchg vCPU1: cmpxchg Note that the second cmpxchg will fail unless the first one wrote back an unchanged value. Hence vCPU1 will be told to re-execute the instruction. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |