[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V2] x86/emulate: synchronize LOCKed instruction emulation



On 03/23/2017 03:23 PM, Jan Beulich wrote:
>>>> On 23.03.17 at 11:21, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>>>>> Sadly, I've now written this (rough) patch:
>>>>>>>
>>>>>>> http://pastebin.com/3DJ5WYt0 
>>>>>>>
>>>>>>> only to find that it does not solve our issue. With multiple processors
>>>>>>> per guest and heavy emulation at boot time, the VM got stuck at roughly
>>>>>>> the same point in its life as before the patch. Looks like more than
>>>>>>> CMPXCHG needs synchronizing.
>>>>>>>
>>>>>>> So it would appear that the smp_lock patch is the only way to bring this
>>>>>>> under control. Either that, or my patch misses something.
>>>>>>> Single-processor guests work just fine.
>>>>>>
>>>>>> Well, first of all the code to return RETRY is commented out. I
>>>>>> may guess that you've tried with it not commented out, but I
>>>>>> can't be sure.
>>>>>
>>>>> Indeed I have tried with it on, with the condition for success being at
>>>>> first "val != old", and then, as it is in the pasted code, "val == new".
>>>>> Both of these caused BSODs and random crashes. The guest only boots
>>>>> without any apparent issues (and with 1 VCPU only) after I've stopped
>>>>> returning RETRY.
>>>>
>>>> "val == new" is clearly wrong: Whether to retry depends on whether
>>>> the cmpxchg was unsuccessful.
>>>
>>> You're right, it looks like using "val == old" as a test for success
>>> (which is of course the only correct test) and return RETRY on fail
>>> works (though I'm still seeing a BSOD very seldom, I'll need to test it
>>> carefully and see what's going on).
>>
>> I've made a race more probable by resuming the VCPU only after
>> processing all the events in the ring buffer and now all my Windows 7
>> 32-bit guests BSOD with IRQL_NOT_LESS_OR_EQUAL every time at boot with
>> the updated patch.
>>
>> Changing the guests' configuration to only use 1 VCPU again solves the
>> issue, and also resuming the VCPU after treating each vm_event makes the
>> BSOD much less likely to appear (hence the illusion that the problem is
>> fixed).
> 
> Well, I can only repeat: We need to understand what it is that
> goes wrong.
> 
>> As for an instruction's crossing a page boundary, I'm not sure how that
>> would affect things here - we're simply emulating whatever instructions
>> cause page faults in interesting pages at boot, we're not yet preventing
>> writes at this point. And if it were such an issue, would it have been
>> fixed (and it is) by the smp_lock version of the patch?
> 
> Yes, it likely would have been: The problem with your code is that
> you map just a single page even when the original access crosses
> a page boundary, yet then access the returned pointer as if you
> had mapped two adjacent pages.

Oh, right, the patch code, of course. For some reason I was thinking
about a rather complicated scenario related to emulating an instruction
touching several protected pages. Yes, you're right, the patch does not
correctly handle the case where we'd write beyond the mapped page.

I'll check if that's what happens.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.