[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Possible issue with x86_emulate when writing results back to memory



On 09/01/14 15:53, Andrew Cooper wrote:
> On 09/01/14 15:39, Simon Graham wrote:
>> We've been seeing very infrequent crashes in Windows VMs for a while where 
>> it appears that the top-byte in a longword that is supposed to hold a 
>> pointer is being set to zero - for example:
>>
>> BUG CHECK 0000000A (00B49A40 00000002 00000001 8261092E)
>>
>> The first parameter to keBugCheck is the faulting address - 00B49A40 in this 
>> case.
>>
>> When we look at the dump, we are in a routine releasing a queued spinlock 
>> and the correct address that should have been used was 0xA3B49A40 and indeed 
>> memory contents in the windows dump have this value. Looking around some 
>> more, we see that the failing processor is executing the release queued 
>> spinlock code and another processor is executing the code to acquire the 
>> same queued spinlock and has recently written the 0xA3B49A40 value to the 
>> location where the failing instruction stream read it from.
>>
>> If we look at the disassembly for the two code paths, the writing code does:
>>
>>      mov dword ptr [edx],eax
>>
>> and the reading code does the following to read this same value:
>>
>>      mov ebx,dword ptr [eax]
>>
>> On a hunch that this might be a problem with the x86_emulate code, I took a 
>> look at how the mov instruction would be emulated - in both cases where 
>> emulation can be done (hvm/emulate.c and mm/shadow/multi.c), the routines 
>> that write instruction results back to memory use memcpy() to actually copy 
>> the data. Looking at the implementation of memcpy in Xen, I see that, in a 
>> 64-bit build as ours is, it will use 'rep movsq' to move the data in 
>> quadwords and then uses 'rep movsb' to move the last 1-7 bytes -- so the 
>> instructions above will, I think, always use byte instructions for the four 
>> bytes.
>>
>> Now, according to the X86 arch, 32-bit mov's are supposed to be atomic but 
>> based on the above they will not be and I am speculating that this is the 
>> cause of our occasional crash - the code path unlocking the spin lock on the 
>> other processor sees a partially written value in memory.
>>
>> Does this seem a reasonable explanation of the issue? 
>>
>> On the assumption that this is correct, I developed the attached patch 
>> (against 4.3.1) which updates all the code paths that are used to read and 
>> writeback the results of instruction emulation to use a simple assignment if 
>> the length is 2 or 4 bytes -- this doesn't fix the general case where you 
>> have a length > 8 but it does handle emulation of MOV instructions. 
>> Unfortunately, the use of emulation in the HVM code uses a generic routine 
>> for copying memory to the guest so every single place that guest memory is 
>> read or written will pay the penalty of the extra check for length - not 
>> sure if that is terrible or not. Since doing this we have not seen a single 
>> instance of the crash - but it's only been a month!
>>
>> The attached patch is for discussion purposes only - if it is deemed 
>> acceptable I'll resubmit a proper patch request against unstable.
> 
> That seems like a plausible explanation.
> 
> The patch however needs some work.  As this function is identical, it
> should have a common implementation somewhere, possibly part of
> x86_emulate.h, and it would probably be better as:
> 
> To better match real hardware, it might be appropriate for
> "memcpy_atomic()" (name subject to improved suggestions) to use a while
> loop and issue 8 byte writes at a time, falling down to 4, 2 then 1 when
> reaching the end of the data to be copied.

Definitely not "memcpy_atomic()" as that suggests the whole copy is
atomic which isn't the case.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.