[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V2] x86/emulate: synchronize LOCKed instruction emulation



>>>>> Sadly, I've now written this (rough) patch:
>>>>>
>>>>> http://pastebin.com/3DJ5WYt0 
>>>>>
>>>>> only to find that it does not solve our issue. With multiple processors
>>>>> per guest and heavy emulation at boot time, the VM got stuck at roughly
>>>>> the same point in its life as before the patch. Looks like more than
>>>>> CMPXCHG needs synchronizing.
>>>>>
>>>>> So it would appear that the smp_lock patch is the only way to bring this
>>>>> under control. Either that, or my patch misses something.
>>>>> Single-processor guests work just fine.
>>>>
>>>> Well, first of all the code to return RETRY is commented out. I
>>>> may guess that you've tried with it not commented out, but I
>>>> can't be sure.
>>>
>>> Indeed I have tried with it on, with the condition for success being at
>>> first "val != old", and then, as it is in the pasted code, "val == new".
>>> Both of these caused BSODs and random crashes. The guest only boots
>>> without any apparent issues (and with 1 VCPU only) after I've stopped
>>> returning RETRY.
>>
>> "val == new" is clearly wrong: Whether to retry depends on whether
>> the cmpxchg was unsuccessful.
> 
> You're right, it looks like using "val == old" as a test for success
> (which is of course the only correct test) and return RETRY on fail
> works (though I'm still seeing a BSOD very seldom, I'll need to test it
> carefully and see what's going on).

I've made a race more probable by resuming the VCPU only after
processing all the events in the ring buffer and now all my Windows 7
32-bit guests BSOD with IRQL_NOT_LESS_OR_EQUAL every time at boot with
the updated patch.

Changing the guests' configuration to only use 1 VCPU again solves the
issue, and also resuming the VCPU after treating each vm_event makes the
BSOD much less likely to appear (hence the illusion that the problem is
fixed).

As for an instruction's crossing a page boundary, I'm not sure how that
would affect things here - we're simply emulating whatever instructions
cause page faults in interesting pages at boot, we're not yet preventing
writes at this point. And if it were such an issue, would it have been
fixed (and it is) by the smp_lock version of the patch?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.