[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation
>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 04/13/16 7:53 PM >>> >LOCK-prefixed instructions are currenly allowed to run in parallel >in x86_emulate(), which can lead the guest into an undefined state. >This patch fixes the issue. ... by ... (read: Too brief a description) >--- a/xen/arch/x86/hvm/emulate.c >+++ b/xen/arch/x86/hvm/emulate.c >@@ -25,6 +25,8 @@ >#include <asm/hvm/svm/svm.h> >#include <asm/vm_event.h> > >+DEFINE_PERCPU_RWLOCK_GLOBAL(emulate_locked_rwlock); You should try hard to make this static. >--- a/xen/arch/x86/mm.c >+++ b/xen/arch/x86/mm.c >@@ -112,6 +112,7 @@ >#include <asm/ldt.h> >#include <asm/x86_emulate.h> >#include <asm/e820.h> >+#include <asm/hvm/emulate.h> This header shouldn't be included here. You need to move the declarations elsewhere for them to be usable here. >@@ -1589,6 +1591,8 @@ x86_emulate( >} >done_prefixes: > >+ ops->smp_lock(lock_prefix); >+ >if ( rex_prefix & REX_W ) >op_bytes = 8; Already from the context it is obvious that the lock can be taken later. >@@ -2380,7 +2390,12 @@ x86_emulate( >} >/* Write back the memory destination with implicit LOCK prefix. */ >dst.val = src.val; >- lock_prefix = 1; >+ if ( !lock_prefix ) >+ { >+ ops->smp_unlock(lock_prefix); >+ lock_prefix = 1; >+ ops->smp_lock(lock_prefix); >+ } This can't be correct: You need to take the write lock _before_ reading the memory location. >@@ -3859,6 +3874,9 @@ x86_emulate( >done: >_put_fpu(); >put_stub(stub); >+ >+ ops->smp_unlock(lock_prefix); And just like above from the context alone it is clear that the unlock can happen earlier. Locked regions should always as small as possible. >--- a/xen/common/domain.c >+++ b/xen/common/domain.c >@@ -272,6 +272,8 @@ struct domain *domain_create(domid_t domid, unsigned int >domcr_flags, > >TRACE_1D(TRC_DOM0_DOM_ADD, d->domain_id); > >+ percpu_rwlock_resource_init(&d->arch.emulate_lock, emulate_locked_rwlock); I cannot see how this would build on ARM. Overall I can see why you want this, but what is the performance impact? After all you're basically making the emulator act like very old systems using a bus lock (i.e. a global one) instead of utilizing cacheline exclusive ownership. Plus you still don't (and cannot, afaict) deal with one LOCKed instruction getting emulated and another in parallel getting executed by the CPU without emulation (granted this shouldn't normally happen, but we also can't exclude that case). With the above I'm also not convinced this is an issue that absolutely needs to be addressed in 4.7 - it's not a regression, and I suppose not a problem for guests that aren't under introspection. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |