[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V2] x86/emulate: synchronize LOCKed instruction emulation
>>>>> Sadly, I've now written this (rough) patch: >>>>> >>>>> http://pastebin.com/3DJ5WYt0 >>>>> >>>>> only to find that it does not solve our issue. With multiple processors >>>>> per guest and heavy emulation at boot time, the VM got stuck at roughly >>>>> the same point in its life as before the patch. Looks like more than >>>>> CMPXCHG needs synchronizing. >>>>> >>>>> So it would appear that the smp_lock patch is the only way to bring this >>>>> under control. Either that, or my patch misses something. >>>>> Single-processor guests work just fine. >>>> >>>> Well, first of all the code to return RETRY is commented out. I >>>> may guess that you've tried with it not commented out, but I >>>> can't be sure. >>> >>> Indeed I have tried with it on, with the condition for success being at >>> first "val != old", and then, as it is in the pasted code, "val == new". >>> Both of these caused BSODs and random crashes. The guest only boots >>> without any apparent issues (and with 1 VCPU only) after I've stopped >>> returning RETRY. >> >> "val == new" is clearly wrong: Whether to retry depends on whether >> the cmpxchg was unsuccessful. > > You're right, it looks like using "val == old" as a test for success > (which is of course the only correct test) and return RETRY on fail > works (though I'm still seeing a BSOD very seldom, I'll need to test it > carefully and see what's going on). I've made a race more probable by resuming the VCPU only after processing all the events in the ring buffer and now all my Windows 7 32-bit guests BSOD with IRQL_NOT_LESS_OR_EQUAL every time at boot with the updated patch. Changing the guests' configuration to only use 1 VCPU again solves the issue, and also resuming the VCPU after treating each vm_event makes the BSOD much less likely to appear (hence the illusion that the problem is fixed). As for an instruction's crossing a page boundary, I'm not sure how that would affect things here - we're simply emulating whatever instructions cause page faults in interesting pages at boot, we're not yet preventing writes at this point. And if it were such an issue, would it have been fixed (and it is) by the smp_lock version of the patch? Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |