[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code



On Wed, 2015-03-18 at 17:05 +0000, George Dunlap wrote:
> On 03/18/2015 04:49 PM, Dario Faggioli wrote:

> >> If this would work, I think it would be a lot better.  It would remove
> >> the credit2-specific callback, and it would mean we wouldn't need an
> >> additional test to filter out
> >>
> > I see. Ok, I guess I can give this a try. If it does not explode,
> > because some weird dependency we can't see right now, we can either try
> > harder to track it, or use the other solution outlined above as a
> > fallback.
> 
> If you could look into this, that would be awesome. :-)
> 
So, I did look into this. In summary, I think it could work, but some
rework is required. Here's my findings in some more details.

So, with the quick-&-dirty(TM) patch attached to this email, Credit2
works-a-charm:

root@Zhaman:~# xl  dmesg |grep runqueue
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 1 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 2 to runqueue 1
(XEN) Adding cpu 3 to runqueue 1
(XEN) Adding cpu 4 to runqueue 1
(XEN) Adding cpu 5 to runqueue 1
(XEN) Adding cpu 6 to runqueue 1
(XEN) Adding cpu 7 to runqueue 1
(XEN) Adding cpu 8 to runqueue 0
(XEN) Adding cpu 9 to runqueue 0
(XEN) Adding cpu 10 to runqueue 0
(XEN) Adding cpu 11 to runqueue 0
(XEN) Adding cpu 12 to runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0

(Yes, cpu0 is assigned to the wrong runqueue, but that is because I'm
ignoring the system_state <==> boot_cpu_data thing for now, so let's
also ignore this, it'll be fixed).

So, things are working, less complexity (i.e., no ad-hoc callback for
credit2), less code (i.e., lots of '-' in the patch), outside is warm
and sunny, spring is coming, birds are singing, etc... :-D :-D

Unfortunately, the patch also makes Credit1 go splat like this:

(XEN) Xen call trace:
(XEN)    [<ffff82d08012d14c>] check_lock+0x37/0x3b
(XEN)    [<ffff82d08012d17b>] _spin_lock+0x11/0x54
(XEN)    [<ffff82d08013498f>] xmem_pool_alloc+0x11c/0x46c
(XEN)    [<ffff82d0801350a6>] _xmalloc+0x106/0x316
(XEN)    [<ffff82d0801352c7>] _xzalloc+0x11/0x32
(XEN)    [<ffff82d08011f5ae>] csched_alloc_pdata+0x47/0x1c6
(XEN)    [<ffff82d0801293d6>] cpu_schedule_callback+0x5a/0xcc
(XEN)    [<ffff82d08011ab8a>] notifier_call_chain+0x67/0x87
(XEN)    [<ffff82d08010169a>] notify_cpu_starting+0x1c/0x24
(XEN)    [<ffff82d080189f5b>] start_secondary+0x203/0x256
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Xen BUG at spinlock.c:48
(XEN) ****************************************

(...and, suddenly, birds shut up :-( )

That is because pool->lock in xmem_pool_alloc() wants to find IRQs
enabled when acquired, as it's been acquired with IRQs enabled before,
and we don't mix IRQ enabled and disabled spinlocks (that's what the BUG
at spinlock:48 is for).

The difference between before and after the patch is that, bebore,
csched_alloc_pdata() is called during CPU_UP_PREPARE, after, during
CPU_STARTING. More specifically, CPU_UP_PREPARE callbacks are invoked as
a consequence of cpu_up() being called in a loop in __start_xen(), and
it itself calls notifier_call_chain(..CPU_UP_PREPARE..). This means they
run:
 - on the boot cpu;
 - with interrupt enabled (see how the call to local_irq_enable() in
   __start_xen() is *before* the for_each_present_cpu() { cpu_up(i); }
   loop).

OTOH, CPU_STARTING callbacks run:
 - on the cpu being brought up;
 - with interrupt disabled (see how the call to local_irq_enable(), in
   start_secondary(), is *after* the invocation of
   notify_cpu_starting()).

Here we are. And the reason why things works ok in Credit2, is that
csched2_alloc_pdata() doesn't really allocate anything! In fact, in
general, handling alloc_pdata() during CPU_STARTING would mean that we
can't allocate any memory which, given the name of the function, would
look rather odd. :-)

Nevertheless I see the value of doing so, and hence I think what we
could do would be to introduce a new hook in the scheduler interface,
called .init_pdata or .init_pcpu, and, in sched_*.c, split the
allocation and the initialization parts. The former will be handled
during CPU_UP_PREPARE, when allocation is possible, the latter during
CPU_STARTING, when we have more info available to perform actual
initializations.

For Credit2, alloc_pdata() wouldn't even need to exist, and all the work
will be done in init_pcpu(), so less complexity than now, and no need
for the ad-hoc starting callback inside sched_credit2.c. For Credit1,
both will be required, but the added complexity in sched_credit.c
doesn't look too bad at all to me.

Of couse, I still have to code it up, and see whether this plays well
with cpupools, cpu on/offlining, etc., but I'd like to hear whether you
like the idea, before getting to do that. :-)

Let me know what you think.

Regards,
Dario
---
 sched_credit2.c |   63 ++------------------------------------------------------
 schedule.c      |   10 +++++---
 2 files changed, 9 insertions(+), 64 deletions(-)
---
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..ac7a7f2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_private 
*prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     int rqi;
     unsigned long flags;
@@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, int 
cpu)
     {
         printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
         spin_unlock_irqrestore(&prv->lock, flags);
-        return;
+        return NULL;
     }
 
     /* Figure out which runqueue to put it in */
@@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, int 
cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
-    return;
-}
-
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
     return (void *)1;
 }
 
@@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void 
*pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..5e7cdc9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
-
     return 0;
 }
 
@@ -1348,10 +1344,16 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        if ( (ops.alloc_pdata != NULL) &&
+             ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
+            return -ENOMEM;
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;

Attachment: xen-sched-cpustarting.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.