[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation



On 04/27/2016 10:14 AM, Razvan Cojocaru wrote:
> On 04/27/2016 09:22 AM, Jan Beulich wrote:
>>>>> On 26.04.16 at 19:23, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> On 04/26/16 19:03, George Dunlap wrote:
>>>> On 19/04/16 17:35, Jan Beulich wrote:
>>>>>>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 04/19/16 1:01 PM >>>
>>>>>> I think this might be because the LOCK prefix should guarantee that the
>>>>>> instruction that follows it has exclusive use of shared memory (for both
>>>>>> reads and writes) but I might be misreading the docs:
>>>>>
>>>>> LOCK definitely has no effect on other than the instruction it gets 
>>>>> applied
>>>>> to.
>>>>
>>>> Sorry I wasn't involved in this discussion -- what was the conclusion here?
>>>>
>>>> FWIW Andy's suggestion of using a stub seemed like the most robust
>>>> solution, if that could be made to work.
>>>>
>>>> If you're going to submit a patch substantially similar to this one, let
>>>> me know so I can review the mm bits of the original patch.
>>>
>>> I'm not really sure.
>>>
>>> Regarding this version of the patch, Jan has asked for more information
>>> on the performance impact, but I'm not sure how to obtain it in a
>>> rigorous manner.
>>
>> That was only one half, which Andrew has now answered. The
>> other was the not understood issue with a later variant you had
>> tried.
> 
> Right, there's the fact that this version (with read / write locking the
> whole function) works whereas just synchonizing hvmemul_cmpxchg() with a
> spin lock does not. It is not yet fully clear why (the part of the
> conversation at the very top of this email was an early attempt to
> elucidate it).

I've kept experimenting with the patch but can't quite figure out why
minimizing the lock scope to the writeback part would not be sufficient,
but it isn't.

I.e. with this code:

3824  writeback:
3825     ops->smp_lock(lock_prefix);
3826     switch ( dst.type )
3827     {
3828     case OP_REG:
3829         /* The 4-byte case *is* correct: in 64-bit mode we
zero-extend. */
3830         switch ( dst.bytes )
3831         {
3832         case 1: *(uint8_t  *)dst.reg = (uint8_t)dst.val; break;
3833         case 2: *(uint16_t *)dst.reg = (uint16_t)dst.val; break;
3834         case 4: *dst.reg = (uint32_t)dst.val; break; /* 64b:
zero-ext */
3835         case 8: *dst.reg = dst.val; break;
3836         }
3837         break;
3838     case OP_MEM:
3839         if ( !(d & Mov) && (dst.orig_val == dst.val) &&
3840              !ctxt->force_writeback )
3841             /* nothing to do */;
3842         else if ( lock_prefix )
3843             rc = ops->cmpxchg(
3844                 dst.mem.seg, dst.mem.off, &dst.orig_val,
3845                 &dst.val, dst.bytes, ctxt);
3846         else
3847             rc = ops->write(
3848                 dst.mem.seg, dst.mem.off, &dst.val, dst.bytes, ctxt);
3849         if ( rc != 0 )
3850         {
3851             ops->smp_unlock(lock_prefix);
3852             goto done;
3853         }
3854     default:
3855         break;
3856     }
3857     ops->smp_unlock(lock_prefix);

I can still reproduce the guest hang. But if I lock at the very
beginning of x86_emulate() and unlock before each return, no more hangs.

I would appreciate any suggestions on how to go about trying to narrow
the scope of the lock, or figure out what subtlety is at work here.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.