[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/2] xen: merge temporary vcpu pinning scenarios



On 26.07.19 12:13, Andrew Cooper wrote:
On 26/07/2019 10:53, Juergen Gross wrote:
On 26.07.19 11:46, Andrew Cooper wrote:
On 24/07/2019 12:26, Juergen Gross wrote:
diff --git a/xen/common/wait.c b/xen/common/wait.c
index 4f830a14e8..3fc5f68611 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -34,8 +34,6 @@ struct waitqueue_vcpu {
        */
       void *esp;
       char *stack;
-    cpumask_t saved_affinity;
-    unsigned int wakeup_cpu;
   #endif
   };
   @@ -131,12 +129,10 @@ static void __prepare_to_wait(struct
waitqueue_vcpu *wqv)
       ASSERT(wqv->esp == 0);
         /* Save current VCPU affinity; force wakeup on *this* CPU
only. */
-    wqv->wakeup_cpu = smp_processor_id();
-    cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-    if ( vcpu_set_hard_affinity(curr, cpumask_of(wqv->wakeup_cpu)) )
+    if ( vcpu_temporary_affinity(curr, smp_processor_id(),
VCPU_AFFINITY_WAIT) )
       {
           gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
-        domain_crash(current->domain);
+        domain_crash(curr->domain);
             for ( ; ; )
               do_softirq();
@@ -170,7 +166,7 @@ static void __prepare_to_wait(struct
waitqueue_vcpu *wqv)
       if ( unlikely(wqv->esp == 0) )
       {
           gdprintk(XENLOG_ERR, "Stack too large in %s\n", __func__);
-        domain_crash(current->domain);
+        domain_crash(curr->domain);
             for ( ; ; )
               do_softirq();
@@ -182,30 +178,24 @@ static void __prepare_to_wait(struct
waitqueue_vcpu *wqv)
   static void __finish_wait(struct waitqueue_vcpu *wqv)
   {
       wqv->esp = NULL;
-    (void)vcpu_set_hard_affinity(current, &wqv->saved_affinity);
+    vcpu_temporary_affinity(current, NR_CPUS, VCPU_AFFINITY_WAIT);
   }
     void check_wakeup_from_wait(void)
   {
-    struct waitqueue_vcpu *wqv = current->waitqueue_vcpu;
+    struct vcpu *curr = current;
+    struct waitqueue_vcpu *wqv = curr->waitqueue_vcpu;
         ASSERT(list_empty(&wqv->list));
         if ( likely(wqv->esp == NULL) )
           return;
   -    /* Check if we woke up on the wrong CPU. */
-    if ( unlikely(smp_processor_id() != wqv->wakeup_cpu) )
+    /* Check if we are still pinned. */
+    if ( unlikely(!(curr->affinity_broken & VCPU_AFFINITY_WAIT)) )
       {
-        /* Re-set VCPU affinity and re-enter the scheduler. */
-        struct vcpu *curr = current;
-        cpumask_copy(&wqv->saved_affinity, curr->cpu_hard_affinity);
-        if ( vcpu_set_hard_affinity(curr,
cpumask_of(wqv->wakeup_cpu)) )
-        {
-            gdprintk(XENLOG_ERR, "Unable to set vcpu affinity\n");
-            domain_crash(current->domain);
-        }
-        wait(); /* takes us back into the scheduler */
+        gdprintk(XENLOG_ERR, "vcpu affinity lost\n");
+        domain_crash(curr->domain);
       }

I'm sorry to retract my R-by after the fact, but I've only just noticed
(while rebasing some of my pending work over this) that it is buggy.

The reason wait() was called is because it is not safe to leave that
if() clause.

With this change in place, we'll arrange for the VM to be crashed, then
longjump back into the stack from from the waiting vCPU, on the wrong
CPU.  Any caller with smp_processor_id() or thread-local variables cache
by pointer on the stack will then cause memory corruption.

Its not immediately obvious how to fix this, but bear in mind that as
soon as the vm-event interface is done, I plan to delete this whole
waitqueue infrastructure anyway.

Shouldn't just calling wait() after domain_crash() be fine then?

That's what would have happened in the original error case, too.

No - I don't think so.  That was to try and get back into a position
where the scheduler rescheduled this vcpu on the correct cpu, so it
could safely longjmp back into context.

But there was a domain_crash() in the code I removed.

In case this already was a problem then I guess the domain_crash()
might need to be replaced by panic(). The only case I'm aware of where
this situation could arise would be a suspend/resume cycle where
wait_event() was active and not all cpus came up again on resume.
That seems to be quite improbable.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.