[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH][1/3][RESEND] evtchn race condition


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Woller, Thomas" <thomas.woller@xxxxxxx>
  • Date: Wed, 25 Jan 2006 09:28:20 -0600
  • Delivery-date: Wed, 25 Jan 2006 15:37:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcYhwapwtgM1u8cyQ1OtXZAbzroS3gAAdIQA
  • Thread-topic: [PATCH][1/3][RESEND] evtchn race condition

Previous post had some corruption in the patch, reposting...

This patch fixes a guest hang condition running unmodified guests on an
SVM partition.  Two test patches follow to facilitate reproduction of
the problem.  Note that the below patch applies to the
xen-unstable-hvm.hg repo changeset 8667, although this problem has been
apparent on xen-unstable.hg tree previously.  I do not know if this
problem occurs on VT partitions, but don't see why it shouldn't also be
an issue for xen-unstable.hg/testing-3.0. 

Testing has shown a very infrequent guest hang/block when testing
multicore linux unmodified guests on SVM partitions. We have found that
this problem is related to the local/remote domain event communications
channel (event_channel.c and hvm/io.c).    

The frequency of the problem is very small, but we notice the problem
during multicore testing only.  The last instruction in the blocked
guest is always an I/O operation.

The conditions under which this problem occurs:
1) unmodified SVM guest performing I/O (hv not trapping i/o, so that
qemu gets a request)
2) guest running on core >0 (hypervisor is SMP)
3) Dom0 is not SMP (ie. CONFIG_SMP NOT defined)

Problem:
The common/event_channel.c evtchn_send() function is calling
evtchn_set_pending() in the ECS_INTERDOMAIN case, without protecting
access to the evtchn_upcall_pending/evtchn_pending_sel bits.  The
evtchn_set_pending() code executes on core0, concurrently with core1
code that tests and/or sets these same bits.  The result is that a guest
ends up in a perpetually blocked state.  The matching code in hvm/io.c
contains 2 location that must be protected by the evtchn spinlock (see
below patch).
 
Reproduction of problem:
Attached are a couple of test patches, one for hvmloader that will
perform continuous I/O (out to 0x3fb) to help reproduce the problem
(hvmloader_io_test.patch).  Putting udelay(10000) in hvm/io.c between
accesses to evtchn_upcall_pending/evtchn_pending_sel will greatly
facilitate reproduction of the problem (attached patch
hvm_io_udelay_test.patch).  With the delay patch and the hvmloader i/o
test patch, the problem occurs within the first few I/O test
instructions in hvmloader.  Adding a cpus='1' to pin the guest on core>0
is necessary, and pinning the guest onto core 0 shows that the guest
must be running on core>0 to hang.

Patch:
The patch is an attempt to combine the local domain spinlock and the
remote domain spinlock similar to the evtchn_bind_interdomain() logic.
The patch removes the initial ld spin_lock()  from protecting the
port_is_valid() and the evtchn_from_port() functions, which allows using
the "ld < rd" spinlock ordering technique... Not sure if this is really
ok or even necessary.  Opinions are welcome here.  Another solution
might be to move the remote domain (rd) spinlock exclusively around the
ECS_INTERDOMAIN case, and the ld spinlock exclusively around the ECS_IPI
case - without any nested spinlocks.  

I did notice that the send_guest_virq() and VLAPIC code also calls
evtchn_set_pending() that is not spinlock protected, but not sure if
this is a problem... wanted to bring it to the lists' attention for
someone more familiar with the virq/vlapic and evtchn code.  Seems like
there would be the potential for losing delivery of a virtual IRQ(?).

And my testing to-date has not included the ECS_IPI case code path with
the below patch.

regards,
Tom Woller
AMD


# HG changeset patch
# User twoller@xxxxxxxxxxxxxxxx
# Node ID d8bb56042ef14283d0f3342a114162329985c983
# Parent  41d3adec102cfc24687c171e1c893740e3d596f6
add evtchn remote domain spinlock protection in evtchn_send().

Signed-off-by: Tom Woller <thomas.woller@xxxxxxx>

diff -r 41d3adec102c -r d8bb56042ef1 xen/common/event_channel.c
--- a/xen/common/event_channel.c        Tue Jan 24 20:26:57 2006
+++ b/xen/common/event_channel.c        Tue Jan 24 20:26:57 2006
@@ -405,19 +405,34 @@
     struct domain *ld = current->domain, *rd;
     int            rport, ret = 0;
 
-    spin_lock(&ld->evtchn_lock);
-
     if ( unlikely(!port_is_valid(ld, lport)) )
     {
-        spin_unlock(&ld->evtchn_lock);
         return -EINVAL;
     }
 
     lchn = evtchn_from_port(ld, lport);
+    
+    if ( lchn->state != ECS_INTERDOMAIN )
+        rd = ld;
+    else 
+        rd = lchn->u.interdomain.remote_dom;
+
+    /* Avoid deadlock by first acquiring lock of domain with smaller
id. */
+    if ( ld < rd )
+    {
+        spin_lock(&ld->evtchn_lock);
+        spin_lock(&rd->evtchn_lock);
+    }
+    else
+    {
+        if ( ld != rd )
+            spin_lock(&rd->evtchn_lock);
+        spin_lock(&ld->evtchn_lock);
+    }
+    
     switch ( lchn->state )
     {
     case ECS_INTERDOMAIN:
-        rd    = lchn->u.interdomain.remote_dom;
         rport = lchn->u.interdomain.remote_port;
         rchn  = evtchn_from_port(rd, rport);
         evtchn_set_pending(rd->vcpu[rchn->notify_vcpu_id], rport);
@@ -433,6 +448,8 @@
     }
 
     spin_unlock(&ld->evtchn_lock);
+    if ( ld != rd )
+        spin_unlock(&rd->evtchn_lock);
 
     return ret;
 }
diff -r 44b96aafa499 -r 41d3adec102c xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c     Tue Jan 24 18:36:58 2006
+++ b/xen/arch/x86/hvm/io.c     Tue Jan 24 20:26:57 2006
@@ -693,6 +693,7 @@
     struct domain *d = v->domain;
     int port = iopacket_port(d);
 
+    spin_lock(&d->evtchn_lock);
     /* evtchn_pending_sel bit is shared by other event channels. */
     if (!d->shared_info->evtchn_pending[port/BITS_PER_LONG])
         clear_bit(port/BITS_PER_LONG,
&v->vcpu_info->evtchn_pending_sel);
@@ -700,6 +701,7 @@
     /* Note: HVM domains may need upcalls as well. */
     if (!v->vcpu_info->evtchn_pending_sel)
         clear_bit(0, &v->vcpu_info->evtchn_upcall_pending);
+    spin_unlock(&d->evtchn_lock);
 
     /* Clear the pending bit for port. */
     return test_and_clear_bit(port,
&d->shared_info->evtchn_pending[0]);
@@ -729,6 +731,7 @@
 void hvm_wait_io(void)
 {
     extern void do_block();
+    struct domain *d = current->domain;
     int port = iopacket_port(current->domain);
 
     do {
@@ -742,8 +745,10 @@
         * Events other than IOPACKET_PORT might have woken us up.
         * In that case, safely go back to sleep.
         */
+        spin_lock(&d->evtchn_lock);
         clear_bit(port/BITS_PER_LONG,
&current->vcpu_info->evtchn_pending_sel);
         clear_bit(0, &current->vcpu_info->evtchn_upcall_pending);
+        spin_unlock(&d->evtchn_lock);
     } while(1);
 }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.