[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL



At 17:51 +0000 on 11 Mar (1299865912), Ian Jackson wrote:
> Mar 11 13:46:58.154777 (XEN) Xen call trace:
> Mar 11 13:46:58.154798 (XEN)    [<ffff82c480100140>] __bitmap_empty+0x0/0x7f
> Mar 11 13:46:58.163767 (XEN)    [<ffff82c480119582>] csched_cpu_pick+0xe/0x10
> Mar 11 13:46:58.163802 (XEN)    [<ffff82c480122c8d>] vcpu_migrate+0xfb/0x230
> Mar 11 13:46:58.178768 (XEN)    [<ffff82c480122e24>] context_saved+0x62/0x7b
> Mar 11 13:46:58.178799 (XEN)    [<ffff82c480157f17>] 
> context_switch+0xd98/0xdca
> Mar 11 13:46:58.183766 (XEN)    [<ffff82c4801226b4>] schedule+0x5fc/0x624
> Mar 11 13:46:58.183795 (XEN)    [<ffff82c480123837>] __do_softirq+0x88/0x99
> Mar 11 13:46:58.198784 (XEN)    [<ffff82c4801238b2>] do_softirq+0x6a/0x7a

I think this hang comes because although this code:

            cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
            if ( commit )
               CSCHED_PCPU(nxt)->idle_bias = cpu;
            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));

removes the new cpu and its siblings from cpus, cpu isn't guaranteed to
have been in cpus in the first place, and none of its siblings are
either since nxt might not be its sibling.

Possible fix:

diff -r b9a5d116102d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 10 13:06:52 2011 +0000
+++ b/xen/common/sched_credit.c Mon Mar 14 09:25:07 2011 +0000
@@ -533,7 +533,7 @@ _csched_cpu_pick(const struct scheduler 
             cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
             if ( commit )
                CSCHED_PCPU(nxt)->idle_bias = cpu;
-            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
+            cpus_andnot(cpus, cpus, nxt_idlers);
         }
         else
         {

which guarantees that nxt will be removed from cpus, though I suspect
this means that we might not pick the best HT pair in a particular core.
Scheduler code is twisty and hurts my brain so I'd like George's
opinion before checking anything in.

Cheers,

Tim.

P.S. the patch above is a one-liner for clarity: a better fix would be:

diff -r b9a5d116102d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Mar 10 13:06:52 2011 +0000
+++ b/xen/common/sched_credit.c Mon Mar 14 09:26:11 2011 +0000
@@ -533,12 +533,8 @@ _csched_cpu_pick(const struct scheduler 
             cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers);
             if ( commit )
                CSCHED_PCPU(nxt)->idle_bias = cpu;
-            cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu));
         }
-        else
-        {
-            cpus_andnot(cpus, cpus, nxt_idlers);
-        }
+        cpus_andnot(cpus, cpus, nxt_idlers);
     }
 
     return cpu;



-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.