[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Possible issue with x86_emulate when writing results back to memory
>>> On 09.01.14 at 16:39, Simon Graham <simon.graham@xxxxxxxxxx> wrote: > We've been seeing very infrequent crashes in Windows VMs for a while where it > appears that the top-byte in a longword that is supposed to hold a pointer is > being set to zero - for example: > > BUG CHECK 0000000A (00B49A40 00000002 00000001 8261092E) > > The first parameter to keBugCheck is the faulting address - 00B49A40 in this > case. > > When we look at the dump, we are in a routine releasing a queued spinlock > and the correct address that should have been used was 0xA3B49A40 and indeed > memory contents in the windows dump have this value. Looking around some > more, we see that the failing processor is executing the release queued > spinlock code and another processor is executing the code to acquire the same > queued spinlock and has recently written the 0xA3B49A40 value to the location > where the failing instruction stream read it from. > > If we look at the disassembly for the two code paths, the writing code does: > > mov dword ptr [edx],eax > > and the reading code does the following to read this same value: > > mov ebx,dword ptr [eax] > > On a hunch that this might be a problem with the x86_emulate code, I took a > look at how the mov instruction would be emulated - in both cases where > emulation can be done (hvm/emulate.c and mm/shadow/multi.c), the routines > that write instruction results back to memory use memcpy() to actually copy > the data. Looking at the implementation of memcpy in Xen, I see that, in a > 64-bit build as ours is, it will use 'rep movsq' to move the data in > quadwords and then uses 'rep movsb' to move the last 1-7 bytes -- so the > instructions above will, I think, always use byte instructions for the four > bytes. > > Now, according to the X86 arch, 32-bit mov's are supposed to be atomic but > based on the above they will not be and I am speculating that this is the > cause of our occasional crash - the code path unlocking the spin lock on the > other processor sees a partially written value in memory. > > Does this seem a reasonable explanation of the issue? Yes - as long as you can also explain why a spin lock operation would make it into the emulation code in the first place. > On the assumption that this is correct, I developed the attached patch > (against 4.3.1) which updates all the code paths that are used to read and > writeback the results of instruction emulation to use a simple assignment if > the length is 2 or 4 bytes -- this doesn't fix the general case where you > have > a length > 8 but it does handle emulation of MOV instructions. Unfortunately, > the use of emulation in the HVM code uses a generic routine for copying > memory to the guest so every single place that guest memory is read or > written will pay the penalty of the extra check for length - not sure if that > is terrible or not. Since doing this we have not seen a single instance of > the crash - but it's only been a month! > > The attached patch is for discussion purposes only - if it is deemed > acceptable I'll resubmit a proper patch request against unstable. I'd rather not add limited scope special casing like that, but instead make the copying much more like real hardware (i.e. not just deal with the 16- and 32-bit cases, and especially not rely on memcpy() using 64-bit reads/writes when it can). IOW - don't use memcpy() here at all (and have a single routine doing The Right Thing (tm) rather than having two clones now, and perhaps more later on - I'd in particular think that the read side in shadow code would also need a similar adjustment). On the mechanical side of things: Such a generic routine should have proper parameter types: "const void *" for the source pointer and "unsigned long" or "size_t" for the count. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |