[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] credi2-ratelimit: Implement rate limit for credit2 scheduler



On Wed, 2016-07-06 at 18:33 +0100, anshul.makkar@xxxxxxxxxx wrote:
> From: Anshul Makkar <anshul.makkar@xxxxxxxxxx>
> 
Hey, Anshul,

Thanks for doing this!

> Rate limit assures that a vcpu will execute for a minimum amount of
> time before
> being put at the back of a queue or being preempted by higher
> priority thread.
> 
I'd say "before a context switch occurs", or "before a context switch
is allowed".

> It introduces a minimum amount of latency to enable a VM to batch its
> work and
> it also ensures that system is not spending most of its time in
> VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate.
> 
Again, something like, "ensures that context switches does not happen
too frequently" seems more accurate and useful as a description.

I also wouldn't use the word "latency", above, to describe the time a
VM has to carry on some work uninterrupted.

> ratelimit can be disabled by setting it to 0.
> 
Good.

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -171,6 +171,11 @@ integer_param("sched_credit2_migrate_resist",
> opt_migrate_resist);
>  #define c2r(_ops, _cpu)     (CSCHED2_PRIV(_ops)->runq_map[(_cpu)])
>  /* CPU to runqueue struct macro */
>  #define RQD(_ops, _cpu)     (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops,
> _cpu)])
> +/* Find the max of time slice */
> +#define MAX_TSLICE(t1, t2)  \
> +                   ({ typeof (t1) _t1 = (t1); \
> +                      typeof (t1) _t2 = (t2); \
> +                      _t1 > _t2 ? _t1 < 0 ? 0 : _t1 : _t2 < 0 ? 0 :
> _t2; })
> 
I'm bad at this things, but isn't subtracting the two values a more
efficient, effective and easier to read way to do this?

Also, I think this macro can be called something like MAX_STIME(), or
STIME_BEFORE (or something like that) and be put in time.h.

> @@ -280,6 +285,7 @@ struct csched2_private {
>      struct csched2_runqueue_data rqd[NR_CPUS];
>  
>      unsigned int load_window_shift;
> +    unsigned ratelimit_us; /* each cpupool can have its onw
> ratelimit */
>  };
>  
>  /*
> @@ -1588,6 +1594,34 @@ csched2_dom_cntl(
>      return rc;
>  }
>  
> +static int csched2_sys_cntl(const struct scheduler *ops,
> +                            struct xen_sysctl_scheduler_op *sc)
> +{
> +    int rc = -EINVAL;
> +    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit;
> +    struct csched2_private *prv = CSCHED2_PRIV(ops);
> +    unsigned long flags;
> +
> +    switch (sc->cmd )
> +    {
> +        case XEN_SYSCTL_SCHEDOP_putinfo:
> +            if ( params->ratelimit_us &&
> +                ( params->ratelimit_us < CSCHED2_MIN_TIMER ||
> +                  params->ratelimit_us >
> MICROSECS(CSCHED2_MAX_TIMER) ))
>
I see the link with CSCHED2_MIN_TIMER. Still, I think we can just
reuse XEN_SYSCTL_SCHED_RATELIMIT_MAX and _MIN from sysctl.h.

In fact, the minimum interval of time for programming a timer and the
minimum (theoretical) value usable for rate-limiting are, logically,
two different things. Not to mention, that MIN_TIMER is an internal,
implementation detail, which we may want to change at some point, for
various reasons, and it's not good to have one user facing parameter
bound to something like that, IMO.

This also will make things more consistent (between Credit1 and
Credit2).

> +                return rc;
> +            spin_lock_irqsave(&prv->lock, flags);
> +            prv->ratelimit_us = params->ratelimit_us;
> +            spin_unlock_irqrestore(&prv->lock, flags);
> +            break;
> +
In Credit1, this break is not there, and putinfo also returns back the
value that it has just written. I don't feel like this is terribly
important, but, as said, there's value in keeping the two schedulers in
sync, with respect to the interface to this.

> @@ -1680,6 +1717,14 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>  
>      /* 1) Basic time: Run until credit is 0. */
>      rt_credit = snext->credit;
> +    if (snext->vcpu->is_running)
> +        runtime = now - snext->vcpu->runstate.state_entry_time;
> +    if ( runtime < 0 )
> +    {
> +        runtime = 0;
> +        d2printk("%s: Time went backwards? now %"PRI_stime"
> state_entry_time %"PRI_stime"\n",
> +                  _func__, now, snext->runstate.state_entry_time);
> +    }
>  
>      /* 2) If there's someone waiting whose credit is positive,
>       * run until your credit ~= his */
> @@ -1695,11 +1740,24 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>      }
>  
>      /* The next guy may actually have a higher credit, if we've
> tried to
> -     * avoid migrating him from a different cpu.  DTRT.  */
> +     * avoid migrating him from a different cpu.  DTRT.
> +     * Even if the next guy has higher credit and current vcpu has
> executed
> +     * for less amount of time than rate limit, allow it to run for
> minimum
> +     * amount of time.
> +     */
>      if ( rt_credit <= 0 )
>      {
> -        time = CSCHED2_MIN_TIMER;
> -        SCHED_STAT_CRANK(runtime_min_timer);
> +        if ( snext->vcpu->is_running && prv->ratelimit_us)
> +           /* implies the current one has executed for time <
> ratelimit and thus
> +            * it has neen selcted int runq_candidate to run next.
> +            * No need to check for this condition again.
> +            */
> +            time = MAX_TSLICE(CSCHED2_MIN_TIMER,
> +                               MICROSECS(prv->ratelimit_us) -
> runtime);
> +        else
> +            time = MAX_TSLICE(CSCHED2_MIN_TIMER, MICROSECS(prv-
> >ratelimit_us));
> +
> +        SCHED_STAT_CRANK(time);
>
If you want to define a new performance counter, you need more than
just this. But I think you should just not do that for now...

>      }
>      else
>      {
> @@ -1709,17 +1767,22 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>           * at a different rate. */
>          time = c2t(rqd, rt_credit, snext);
>  
> -        /* Check limits */
> -        if ( time < CSCHED2_MIN_TIMER )
> -        {
> -            time = CSCHED2_MIN_TIMER;
> -            SCHED_STAT_CRANK(runtime_min_timer);
> -        }
> -        else if ( time > CSCHED2_MAX_TIMER )
> +        /* Check limits.
> +         * time > ratelimit --> time > MIN.
> +         */
> +        if ( time > CSCHED2_MAX_TIMER )
>          {
> +
>              time = CSCHED2_MAX_TIMER;
>              SCHED_STAT_CRANK(runtime_max_timer);
>          }
> +        else
> +        {
> +            time = MAX_TSLICE(MAX_TSLICE(CSCHED2_MIN_TIMER,
> +                                          MICROSECS(prv-
> >ratelimit_us)), time);
> +            SCHED_STAT_CRANK(time);
> +        }
> +
>      }
>  
So, I've been thinking... Why don't we "just" fixup, right here, the
value that we are about to return?

I.e., basically we return the max between time and
prv->runtime_us - runtime (with runtime being =0 in case snext hasn't
run yet). Or am I missing something?

>      return time;

> @@ -1762,9 +1835,13 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>          }
>  
>          /* If the next one on the list has more credit than current
> -         * (or idle, if current is not runnable), choose it. */
> +         * (or idle, if current is not runnable) and current one has
> already
> +         * executed for more than ratelimit. choose it.
> +         * Control has reached here means that current vcpu has
> executed >
> +         * ratelimit_us or ratelimit is off, so chose the next one.
> +         */
>          if ( svc->credit > snext->credit )
> -            snext = svc;
> +                snext = svc;
>  
I don't think I see the need for these changes (with this line here
above being more wrong than pointless, actually).

> @@ -2353,6 +2431,8 @@ csched2_init(struct scheduler *ops)
>          prv->runq_map[i] = -1;
>          prv->rqd[i].id = -1;
>      }
> +    /* initialize ratelimit */
> +    prv->ratelimit_us = 1000 * CSCHED2_MIN_TIMER;
>  
>      prv->load_window_shift = opt_load_window_shift;
>
Mmm... I'd rather set this to either 0 (so, disabled)
or XEN_SYSCTL_CSCHED_TSLICE_MIN (but I indeed prefer 0) by default.
Then do some experiments, see if it helps and try to come up with a
reasonable value.

Thanks and regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.