[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V2] x86/emulate: synchronize LOCKed instruction emulation
>>> On 23.03.17 at 11:21, <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>>>>> Sadly, I've now written this (rough) patch: >>>>>> >>>>>> http://pastebin.com/3DJ5WYt0 >>>>>> >>>>>> only to find that it does not solve our issue. With multiple processors >>>>>> per guest and heavy emulation at boot time, the VM got stuck at roughly >>>>>> the same point in its life as before the patch. Looks like more than >>>>>> CMPXCHG needs synchronizing. >>>>>> >>>>>> So it would appear that the smp_lock patch is the only way to bring this >>>>>> under control. Either that, or my patch misses something. >>>>>> Single-processor guests work just fine. >>>>> >>>>> Well, first of all the code to return RETRY is commented out. I >>>>> may guess that you've tried with it not commented out, but I >>>>> can't be sure. >>>> >>>> Indeed I have tried with it on, with the condition for success being at >>>> first "val != old", and then, as it is in the pasted code, "val == new". >>>> Both of these caused BSODs and random crashes. The guest only boots >>>> without any apparent issues (and with 1 VCPU only) after I've stopped >>>> returning RETRY. >>> >>> "val == new" is clearly wrong: Whether to retry depends on whether >>> the cmpxchg was unsuccessful. >> >> You're right, it looks like using "val == old" as a test for success >> (which is of course the only correct test) and return RETRY on fail >> works (though I'm still seeing a BSOD very seldom, I'll need to test it >> carefully and see what's going on). > > I've made a race more probable by resuming the VCPU only after > processing all the events in the ring buffer and now all my Windows 7 > 32-bit guests BSOD with IRQL_NOT_LESS_OR_EQUAL every time at boot with > the updated patch. > > Changing the guests' configuration to only use 1 VCPU again solves the > issue, and also resuming the VCPU after treating each vm_event makes the > BSOD much less likely to appear (hence the illusion that the problem is > fixed). Well, I can only repeat: We need to understand what it is that goes wrong. > As for an instruction's crossing a page boundary, I'm not sure how that > would affect things here - we're simply emulating whatever instructions > cause page faults in interesting pages at boot, we're not yet preventing > writes at this point. And if it were such an issue, would it have been > fixed (and it is) by the smp_lock version of the patch? Yes, it likely would have been: The problem with your code is that you map just a single page even when the original access crosses a page boundary, yet then access the returned pointer as if you had mapped two adjacent pages. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |