[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen master] sched: fix (ACPI S3) resume with cpupools with different schedulers



commit dc018634b0814399880ccfe827711583d19108ca
Author:     Dario Faggioli <dario.faggioli@xxxxxxxxxx>
AuthorDate: Thu Dec 10 17:24:51 2015 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Thu Dec 10 17:24:51 2015 +0100

    sched: fix (ACPI S3) resume with cpupools with different schedulers
    
    In fact, with 2 cpupools, one (the default) Credit and
    one Credit2 (with at least 1 pCPU in the latter), trying
    a (e.g., ACPI S3) suspend/resume crashes like this:
    
    (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted 
]----
    (XEN) [  150.587783] CPU:    6
    (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] 
sched_credit.c#csched_schedule+0xf2/0xc3d
    (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
    (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 
0000000000000000
    (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: 
ffff82d0802a9788
    (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  
ffff83031fa80000
    (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 
0000000000000006
    (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 
0000000000000006
    (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 
00000000000026e0
    (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
    (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   
cs: e008
    (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
    ... ... ...
    (XEN) [  150.587962] Xen call trace:
    (XEN) [  150.587966]    [<ffff82d080123a10>] 
sched_credit.c#csched_schedule+0xf2/0xc3d
    (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
    (XEN) [  150.587979]    [<ffff82d08012dc16>] 
softirq.c#__do_softirq+0x82/0x8d
    (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
    (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
    (XEN) [  151.272182]
    (XEN) [  151.274174] ****************************************
    (XEN) [  151.279624] Panic on CPU 6:
    (XEN) [  151.282915] Xen BUG at sched_credit.c:655
    (XEN) [  151.287415] ****************************************
    
    During suspend, the pCPUs are not removed from their
    pools with the standard procedure (which would involve
    schedule_cpu_switch(). During resume, they:
     1) are assigned to the default cpupool (CPU_UP_PREPARE
        phase);
     2) are moved to the pool they were in before suspend,
        via schedule_cpu_switch() (CPU_ONLINE phase)
    
    During resume, scheduling (even if just the idle loop)
    can happen right after the CPU_STARTING phase(before
    CPU_ONLINE), i.e., before the pCPU is put back in its
    pool. In this case, it is the default pool'sscheduler
    that is invoked (Credit1, in the example above). But,
    during suspend, the Credit2 specific vCPU data is not
    being freed, and Credit1 specific vCPU data is not
    allocated, during resume.
    
    Therefore, Credit1 schedules on pCPUs whose idle vCPU's
    sched_priv points to Credit2 vCPU data, and we crash.
    
    Fix things by properly deallocating scheduler specific
    data of the pCPU's pool scheduler during pCPU teardown,
    and re-allocating them --always for &ops-- during pCPU
    bringup.
    
    This also fixes another (latent) bug. In fact, it avoids,
    still in schedule_cpu_switch(), that Credit1's free_vdata()
    is used to deallocate data allocated with Credit2's
    alloc_vdata(). This is not easy to trigger, but only
    because the other bug shown above manifests first and
    crashes the host.
    
    The downside of this patch, is that it adds one more
    allocation on the resume path, which is not ideal. Still,
    there is no better way of fixing the described bugs at
    the moment. Removing (all ideally) allocations happening
    during resume should continue being chased, in the long
    run.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
    Reviewed-by: Juergen Gross <jgross@xxxxxxxx>
    Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxx>
---
 xen/common/schedule.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c195129..d121896 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1380,6 +1380,27 @@ static int cpu_schedule_up(unsigned int cpu)
 
     if ( idle_vcpu[cpu] == NULL )
         alloc_vcpu(idle_vcpu[0]->domain, cpu, cpu);
+    else
+    {
+        struct vcpu *idle = idle_vcpu[cpu];
+
+        /*
+         * During (ACPI?) suspend the idle vCPU for this pCPU is not freed,
+         * while its scheduler specific data (what is pointed by sched_priv)
+         * is. Also, at this stage of the resume path, we attach the pCPU
+         * to the default scheduler, no matter in what cpupool it was before
+         * suspend. To avoid inconsistency, let's allocate default scheduler
+         * data for the idle vCPU here. If the pCPU was in a different pool
+         * with a different scheduler, it is schedule_cpu_switch(), invoked
+         * later, that will set things up as appropriate.
+         */
+        ASSERT(idle->sched_priv == NULL);
+
+        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
+                                    idle->domain->sched_priv);
+        if ( idle->sched_priv == NULL )
+            return -ENOMEM;
+    }
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
@@ -1397,6 +1418,10 @@ static void cpu_schedule_down(unsigned int cpu)
 
     if ( sd->sched_priv != NULL )
         SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
+
+    idle_vcpu[cpu]->sched_priv = NULL;
+    sd->sched_priv = NULL;
 
     kill_timer(&sd->s_timer);
 }
--
generated by git-patchbot for /home/xen/git/xen.git#master

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.