[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.



On Sun, 2015-06-28 at 00:06 -0700, Meng Xu wrote:
> [Adding Dario into the cc list...]
> 
> Dario, this patch has some coding issues and some other minor issues.
> I tried to point out some in this email. IMHO, the main purpose of
> this patch is to confirm that  the implementation fully reflects what
> you suggests in the previous version. Once we make sure the direction
> we are heading on is correct, we can improve the minor issues again.
>
So, all in all, this looks a lot better than previous one. It's easier
to understand, review, etc., as far as the patch itself is concerned,
and I believe it makes the code look (and, in the long run) perform
better, one (something similar to this) will be applied.

> (Of course, these minor issues are not actually minor things and
> should be fixed before sending out the patch. Since it has been sent
> out, let's first make sure the top issue is solved. :-( )
> 
That is fine. The way one usually marks a similar patch or series, is to
put RFC as a tag. Then, provided you describe, in the cover and/or in
the single patches, what the issues are and how that RFC should be
interpreted, you're fine to send to the list (almost) anything. :-)

> 2015-06-27 12:46 GMT-07:00 Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>:
> > To do this, we create a new list that holds, for each
> > vcpu, the time least into the future that it may need to be
> > rescheduled. The scheduler chooses the lowest time off of this
> > list and waits until the specified time instead of running every
> > 1 ms as it did before.
> 
> Can you summarize the design discussion with Dario on the first
> version of the patch? The above information is ok, but it does not say
> much about the design. As you can see in our discussion with Dario,
> there exist several design choices to achieve this. Explicitly state
> it here can help people understand what this patch is doing.
> 
Yes, that would be helpful. It's not required to summarize all the
various ML thread in the actual changelog(s), but, really, a summary and
maybe a few links and pointers in the cover letter, would be useful, to
give interested people that may join late the chance to get some
context.

Note that, even for single patch and not only for series, it is ok to
include a cover letter (i.e., a [PATCH 0/x]), e.g., right for this
purpose.

So, as I was saying, I like the architecture now. There are issues, or
at least I have some questions about specific decisions or code chunks,
but I think this is on the right track now.

> > Signed-off-by: Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>
> > Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
> > ---

> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > @@ -4,6 +4,7 @@
> >   *
> >   * by Sisu Xi, 2013, Washington University in Saint Louis
> >   * and Meng Xu, 2014, University of Pennsylvania
> > + * and Dagaen Golomb, 2015, University of Pennsylvania
> >   *
> >   * based on the code of credit Scheduler
> >   */
> > @@ -152,6 +153,8 @@ struct rt_private {
> >      struct list_head sdom;      /* list of availalbe domains, used for 
> > dump */
> >      struct list_head runq;      /* ordered list of runnable vcpus */
> >      struct list_head depletedq; /* unordered list of depleted vcpus */
> > +    struct timer replenishment_timer;
> > +    unsigned NUM_CPUS;
> 
> First, it does not follow the convension of variable names. UPPER CASE
> should be the macro definition and a constant, which is not the case.
> You can use a more proper name, such as ncpus, here and add what it is
> used for.
> 
Agreed.

> >      cpumask_t tickled;          /* cpus been tickled */
> >  };
> >
> > @@ -178,6 +181,8 @@ struct rt_vcpu {
> >      unsigned flags;             /* mark __RTDS_scheduled, etc.. */
> >  };
> >
> > +static void runq_tickle(const struct scheduler *, struct rt_vcpu *);
> 
> Not sure if declaring the function is the best approach or having
> another patch to move the function forward. But anyway, we don't have
> to worry about this yet right now, considering there are more
> important issues to tackle for this patch.
>
Sure.

Personally, I would like moving the function better (in a separate
patch). Forward declarations make it (slightly, yes, but still...) more
difficult to find (and switch, like in editors) between call sites and
where the actual code is.

> > @@ -411,15 +422,77 @@ __runq_insert(const struct scheduler *ops, struct 
> > rt_vcpu *svc)
> >              struct rt_vcpu * iter_svc = __q_elem(iter);
> >              if ( svc->cur_deadline <= iter_svc->cur_deadline )
> >                      break;
> > +            else
> > +                ++inserted_index;
> >           }
> >          list_add_tail(&svc->q_elem, iter);
> > +        return inserted_index;
> >      }
> >      else
> >      {
> > -        list_add(&svc->q_elem, &prv->depletedq);
> > +        // If we are inserting into previously empty depletedq, rearm
> > +        //   rearm replenish timer. It will handle further scheduling until
> > +        //   the depletedq is empty again (if ever)
> > +        if( list_empty(depletedq) )
> > +            set_timer( &prv->replenishment_timer, svc->cur_deadline );
> 
> Hmm, can you handle the set_timer outside of this function? It should
> be possible and make this runq_insert() function serve for only one
> purpose.
> 
TBH, I think I like it being here. I mean, __runq_insert is called in
many places (at least right now), and I wouldn't like to have the check
and the arming repeated everywhere.

Then, that being said, I'm still hoping that some of these calls could
go away, and maybe there are call sites where we know that the timer
won't be armed in any case... So, we'll have to see about this.

So, just take this comment as "I don't dislike the timer machinery to be
in here". :-)

> > +static void replenishment_handler(void* data)
> > +{
> > +    const struct scheduler *ops = data;
> > +    struct rt_private *prv = rt_priv(ops);
> > +    struct list_head *depletedq = rt_depletedq(ops);
> > +    struct list_head *iter;
> > +    struct list_head *tmp;
> > +    struct rt_vcpu *svc = NULL;
> > +    s_time_t now = NOW();
> > +    int new_position = 0;
> > +    unsigned long flags;
> > +
> > +    spin_lock_irqsave(&prv->lock, flags);
> > +
> > +    // Replenish the vCPU that triggered this, and resort it into runq
> > +    list_for_each_safe(iter, tmp, depletedq)
> > +    {
> > +        svc = __q_elem(iter);
> > +        if ( now >= svc->cur_deadline )
> > +        {
> > +            rt_update_deadline(now, svc);
> > +            __q_remove(svc); /* remove from depleted queue */
> > +            new_position = __runq_insert(ops, svc); /* add to runq */
> > +        }
> > +        else break; // leave list_for_each_safe
> > +    }
> > +
> > +    // If we become one of top [# CPUs] in the runq, tickle it
> > +    // TODO: make this work when multiple tickles are required
> 
> Why do you need multiple tickles?
> 
Because the loop above may have been done more than one replenishment,
which makes sense.

While we are here. I've said that I don't like the fact that we have one
big spinlock for the whole scheduler. However, we do have it right now,
so let's make good use of it.

So (but please double check what I'm about to say, and provide your
view), it should be safe to access the various svc->vc->processor from
inside here, since we're holding the big log. If that's the case, then
it should be fairly easy to figure out which are the pCPUs that needs to
reschedule (playing with the index, etc), and then use their
vc->processor to decide what pCPU to actually tickle, isn't this the
case?

(let me know if I've explained myself...)

> > @@ -427,6 +500,7 @@ static int
> >  rt_init(struct scheduler *ops)
> >  {
> >      struct rt_private *prv = xzalloc(struct rt_private);
> > +    int cpu = 0;
> >
> >      printk("Initializing RTDS scheduler\n"
> >             "WARNING: This is experimental software in development.\n"
> > @@ -450,6 +524,15 @@ rt_init(struct scheduler *ops)
> >      INIT_LIST_HEAD(&prv->runq);
> >      INIT_LIST_HEAD(&prv->depletedq);
> >
> > +    // Replenishment timer, for now always on CPU 0
> > +    init_timer(&prv->replenishment_timer, replenishment_handler, ops, 0);
>
Oh, yes, I wanted to discuss about this! It is true that in Xen, timers
are bound to cpus (while, e.g., in Linux, but both the interface and the
implementation are different).

So, for know, you're doing the right thing in just putting the
replenishment timer on one pCPU. We'll se how to improve this later on.

> > +    // Calculate number of pCPUs present.
> > +    // Note: assumes does not change online.
> 
> Dario, is this a valid assumption? I'm not sure if Xen supports cpu
> hot plug and if cpu-hotplug (in case xen supports it) will invalidate
> this assumption.
> 
It's not, but...

> > +    for_each_present_cpu(cpu)
> > +    {
> > +        ++(prv->NUM_CPUS);
> > +    }
> > +
> 
> This is incorrect. The for_each_present_cpu() list all cpus in the
> system. it enumerate the cpu_present_map.
> 
> What you want here is the number of cpus for the scheduler in the
> current cpupool. You can increase this number when  rt_alloc_pdata()
> is called.
> 
... yes, exactly, that's something rather easy to fix. Just use the pCPU
allocation and initialization routines (and the matching de-allocation
ones) to keep the count updated.

> > @@ -838,6 +922,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> > bool_t tasklet_work_sched
> >  {
> >      const int cpu = smp_processor_id();
> >      struct rt_private *prv = rt_priv(ops);
> > +    struct list_head *runq = rt_runq(ops);
> >      struct rt_vcpu *const scurr = rt_vcpu(current);
> >      struct rt_vcpu *snext = NULL;
> >      struct task_slice ret = { .migrated = 0 };
> > @@ -848,8 +933,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> > bool_t tasklet_work_sched
> >      /* burn_budget would return for IDLE VCPU */
> >      burn_budget(ops, scurr, now);
> >
> > -    __repl_update(ops, now);
> > -
> >      if ( tasklet_work_scheduled )
> >      {
> >          snext = rt_vcpu(idle_vcpu[cpu]);
> > @@ -889,7 +972,10 @@ rt_schedule(const struct scheduler *ops, s_time_t now, 
> > bool_t tasklet_work_sched
> >          }
> >      }
> >
> > -    ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */
> > +    /* Use minimum from runq */
> > +    ret.time = MAX_SCHEDULE;
> 
> why it is not snext->budget? The budget enforcement only need to be
> invoked when the VCPU scheduled to run runs out of budget;
> 
Yep, I was thinking to make right the same comment. :-)

Allow me to say this: thanks (to both Daegan and Meng) for doing this
work. It's been a pleasure to have so much an interesting architectural
discussion, and it's great to see results already. :-D

Regards,
Dario

PS. I've got a few other comments... I'll gather them and send a new
email...
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.