[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 3/3] xen/events: rework fifo queue locking



On 24.11.20 15:02, Jan Beulich wrote:
On 24.11.2020 08:01, Juergen Gross wrote:
Two cpus entering evtchn_fifo_set_pending() for the same event channel
can race in case the first one gets interrupted after setting
EVTCHN_FIFO_PENDING and when the other one manages to set
EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
lead to evtchn_check_pollers() being called before the event is put
properly into the queue, resulting eventually in the guest not seeing
the event pending and thus blocking forever afterwards.

Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
lock") made the race just more obvious, while the fifo event channel
implementation had this race from the beginning when an unmask operation
was running in parallel with an event channel send operation.

Ah yes, but then also only for inter-domain channels, as it was
only in that case that the "wrong" domain's event lock was held.
IOW there was a much earlier change already where this issue
got widened (when the per-channel locking got introduced). This
then got reduced to the original scope by XSA-343's adding of
locking to evtchn_unmask(). (Not sure how much of this history
wants actually adding here. I'm writing it down not the least to
make sure I have a complete enough picture.)

I think we both agree that this race was possible for quite some time.
And I even think one customer bug I've been looking into recently
might be exactly this problem (a dom0 was occasionally hanging in
cross-cpu function calls, but switching to 2-level events made the
problem disappear).


For avoiding this race the queue locking in evtchn_fifo_set_pending()
needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too.

Perhaps mention the prior possible (and imo more natural)
alternative of taking consistent per-channel locks would have
been an alternative, until they got converted to rw ones?

Okay (with reasoning why this is no simple option due to the lock
needed to be taken with interrupts on and off).


Additionally when an
event channel needs to change queues both queues need to be locked
initially.

Since this was (afaict) intentionally not the case before, I
think I would want to see a word spent on the "why", perhaps
better in a code comment than here. Even more so that you
delete a respective comment justifying the possible race as
permissible. And I have to admit right now I'm still uncertain
both ways, i.e. I neither have a clear understanding of why it
would have been considered fine the other way around before,
nor why the double locking is strictly needed.

I need the double locking to avoid someone entering the locked region
when dropping the lock for the old queue and taking the one for the
new queue, as this would open the same race window again.


Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Fixes: 88910061ec615b2d ("evtchn: add FIFO-based event channel hypercalls and port 
ops")
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

I guess at least this one wants a Reported-by.

Oh, right. Sorry for missing that.


@@ -204,6 +175,48 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct 
evtchn *evtchn)
          return;
      }
+ for ( try = 0; ; try++ )
+    {
+        union evtchn_fifo_lastq lastq;
+        struct vcpu *old_v;

I think this one can have const added.

Yes.


+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        old_v = d->vcpu[lastq.last_vcpu_id];
+
+        q = &v->evtchn_fifo->queue[evtchn->priority];
+        old_q = &old_v->evtchn_fifo->queue[lastq.last_priority];
+
+        if ( q <= old_q )
+        {
+            spin_lock_irqsave(&q->lock, flags);
+            if ( q != old_q )
+                spin_lock(&old_q->lock);
+        }
+        else
+        {
+            spin_lock_irqsave(&old_q->lock, flags);
+            spin_lock(&q->lock);
+        }

Since the vast majority of cases is going to be q == old_q, would
it be worth structuring this like

         if ( q == old_q )
             spin_lock_irqsave(&q->lock, flags);
         else if ( q < old_q )
         {
             spin_lock_irqsave(&q->lock, flags);
             spin_lock(&old_q->lock);
         }
         else
         {
             spin_lock_irqsave(&old_q->lock, flags);
             spin_lock(&q->lock);
         }

saving (on average) half a conditional in this most common
case? (This is specifically different from the double locking in

Fine with me.

event_channel.c, where the common case is to have different
entities. In fact double_evtchn_{,un}lock() look to pointlessly
check for chn1 == chn2 - I guess I'll make a patch.)

+        lastq.raw = read_atomic(&evtchn->fifo_lastq);
+        old_v = d->vcpu[lastq.last_vcpu_id];
+        if ( q == &v->evtchn_fifo->queue[evtchn->priority] &&
+             old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] )
+            break;
+
+        if ( q != old_q )
+            spin_unlock(&old_q->lock);
+        spin_unlock_irqrestore(&q->lock, flags);
+
+        if ( try == 3 )
+        {
+            gprintk(XENLOG_WARNING,
+                    "dom%d port %d lost event (too many queue changes)\n",
+                    d->domain_id, evtchn->port);
+            return;

Originally evtchn_check_pollers() would still have been called
in this case. Wouldn't you better retain this, or else justify
the possibly observable change in behavior?

I could retain it, but without having set the event to be pending
I don't see the value in doing so.


@@ -228,22 +239,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct 
evtchn *evtchn)
              goto done;
          }

This if() right above here can, aiui, in principle be moved out
of the surrounding if(), at which point ...

It can even be moved out of the locked region.


-        /*
-         * No locking around getting the queue. This may race with
-         * changing the priority but we are allowed to signal the
-         * event once on the old priority.
-         */
-        q = &v->evtchn_fifo->queue[evtchn->priority];
-
-        old_q = lock_old_queue(d, evtchn, &flags);
-        if ( !old_q )
-            goto done;

... with all of this gone ...

          if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) )
-        {
-            spin_unlock_irqrestore(&old_q->lock, flags);
              goto done;
-        }

... this could become part of the outer if(), replacing the 2nd
guest_test_bit() there. (Possibly, if deemed worthwhile at all,
to be carried out in a separate follow-on patch, to keep focus
here on the actual goal.)

Will add a patch doing that.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.