[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation

On 05/04/2016 04:42 PM, Jan Beulich wrote:
>>>> On 04.05.16 at 13:32, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>> But while implementing a stub that falls back to the actual LOCK CMPXCHG
>> and replacing hvm_copy_to_guest_virt() with it would indeed be an
>> improvement (with the added advantage of being able to treat
>> non-emulated LOCK CMPXCHG cases), I don't understand how that would
>> solve the read-modify-write atomicity problem.
>> AFAICT, this would only solve the write problem. Assuming we have VCPU1
>> and VCPU2 emulating a LOCKed instruction expecting rmw atomicity, the
>> stub alone would not prevent this:
>> VCPU1: read, modify
>> VCPU2: read, modify, write
>> VCPU1: write
> I'm not sure I follow what you mean here: Does the above represent
> what the guest does, or what the hypervisor does as steps to emulate
> a _single_ guest instruction? In the former case, I don't see what
> you're after. And in the latter case I don't understand why you think
> using CMPXCHG instead of WRITE wouldn't help.

Briefly, this is the scenario: assuming a guest with two VCPUs and an
introspection application that has restricted access to a page, the
guest runs two LOCK instructions that touch the page, causing a page
fault for each instruction. This further translates to two EPT fault
vm_events being placed in the ring buffer.

By the time the introspection application polls the event channel, both
VCPUs are paused, waiting for replies to the vm_events.

If the monitoring application processes both events (puts both replies,
with the emulate option on, in the ring buffer), then signals the event
channel, it is possible that both VCPUs get woken up, ending up running
x86_emulate() simultaneously.

In this case, my understanding is that just using CMPXCHG will not work
(although it is clearly superior to the current implementation), because
the read part and the write part of x86_emulate() (when LOCKed
instructions are involved) should be executed atomically, but writing
the CMPXCHG stub would only make sure that two simultaneous writes won't

In other words, this would still be possible (atomicity would still not
be guaranteed for LOCKed instructions):

VCPU1: read
VCPU2: read, write
VCPU1: write

when what we want for LOCKed instructions is:

VCPU1: read, write
VCPU2: read, write

Am I misunderstanding how x86_emulate() works?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.