[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/4] sched: credit2: consider per-vcpu soft affinity



On Thu, Mar 26, 2015 at 9:48 AM, Justin T. Weaver <jtweaver@xxxxxxxxxx> wrote:
>  * choose_cpu
>
> choose_cpu now tries to find the run queue with the most cpus in the given
> vcpu's soft affinity. It uses minimum run queue load as a tie breaker.
[snip]
>  * choose_cpu: added balance loop to find cpu for given vcpu that has most
>    soft cpus (with run queue load being a tie breaker), or if none were found,
>    or not considering soft affinity, pick cpu from runq with least load
[snip]
> @@ -1086,7 +1130,7 @@ static int
>  choose_cpu(const struct scheduler *ops, struct vcpu *vc)
>  {
>      struct csched2_private *prv = CSCHED2_PRIV(ops);
> -    int i, min_rqi = -1, new_cpu;
> +    int i, rqi = -1, new_cpu, max_soft_cpus = 0, balance_step;
>      struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
>      s_time_t min_avgload;
>

Hey Justin -- sorry for taking so long to get back to this one.

Before getting into the changes to choose_cpu(): it looks like on the
__CSFLAG_runq_migrate_request path (starting with "First check to see
if we're here because someone else suggested a place for us to move"),
we only consider the hard affinity, not the soft affinity.  Is that
intentional?

> @@ -1143,9 +1187,28 @@ choose_cpu(const struct scheduler *ops, struct vcpu 
> *vc)
>
>      min_avgload = MAX_LOAD;
>
> -    /* Find the runqueue with the lowest instantaneous load */
> +    /*
> +     * Find the run queue with the most cpus in vc's soft affinity. If there
> +     * is more than one queue with the highest soft affinity cpu count, then
> +     * pick the one with the lowest instantaneous run queue load. If the
> +     * vcpu does not have soft affinity, then only try to find the run queue
> +     * with the lowest instantaneous load.
> +     */
> +    for_each_sched_balance_step( balance_step )
> +    {
> +        if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +            && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
> +            continue;
> +
> +        if ( balance_step == SCHED_BALANCE_HARD_AFFINITY && rqi > -1 )
> +        {
> +            balance_step = SCHED_BALANCE_SOFT_AFFINITY;
> +            break;
> +        }
> +
>          for_each_cpu(i, &prv->active_queues)
>          {
> +            int rqd_soft_cpus = 0;
>              struct csched2_runqueue_data *rqd;
>              s_time_t rqd_avgload = MAX_LOAD;
>
> @@ -1163,35 +1226,61 @@ choose_cpu(const struct scheduler *ops, struct vcpu 
> *vc)
>               * so it is possible here that svc does not have hard affinity
>               * with any of the pcpus of svc's currently assigned run queue.
>               */
> +            sched_balance_cpumask(vc, balance_step, csched2_cpumask);
>              if ( rqd == svc->rqd )
>              {
> -                if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) 
> )
> +                if ( cpumask_intersects(csched2_cpumask, &rqd->active) )
>                      rqd_avgload = rqd->b_avgload - svc->avgload;
> +                if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY )
> +                {
> +                    cpumask_and(csched2_cpumask, csched2_cpumask,
> +                        &rqd->active);
> +                    rqd_soft_cpus = cpumask_weight(csched2_cpumask);
> +                }
>              }
>              else if ( spin_trylock(&rqd->lock) )
>              {
> -                if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) 
> )
> +                if ( cpumask_intersects(csched2_cpumask, &rqd->active) )
>                      rqd_avgload = rqd->b_avgload;
> +                if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY )
> +                {
> +                    cpumask_and(csched2_cpumask, csched2_cpumask,
> +                        &rqd->active);
> +                    rqd_soft_cpus = cpumask_weight(csched2_cpumask);
> +                }
>
>                  spin_unlock(&rqd->lock);
>              }
>              else
>                  continue;
>
> -            if ( rqd_avgload < min_avgload )
> +            if ( balance_step == SCHED_BALANCE_SOFT_AFFINITY
> +                && rqd_soft_cpus > 0
> +                && ( rqd_soft_cpus > max_soft_cpus
> +                    ||
> +                   ( rqd_soft_cpus == max_soft_cpus
> +                    && rqd_avgload < min_avgload )) )
> +            {
> +                max_soft_cpus = rqd_soft_cpus;
> +                rqi = i;
> +                min_avgload = rqd_avgload;
> +            }
> +            else if ( balance_step == SCHED_BALANCE_HARD_AFFINITY
> +                     && rqd_avgload < min_avgload )
>              {
> +                rqi = i;
>                  min_avgload = rqd_avgload;
> -                min_rqi=i;
>              }
> +        }
>      }
>
>      /* We didn't find anyone (most likely because of spinlock contention). */
> -    if ( min_rqi == -1 )
> +    if ( rqi == -1 )
>          new_cpu = get_fallback_cpu(svc);
>      else
>      {
> -        cpumask_and(csched2_cpumask, vc->cpu_hard_affinity,
> -            &prv->rqd[min_rqi].active);
> +        sched_balance_cpumask(vc, balance_step, csched2_cpumask);
> +        cpumask_and(csched2_cpumask, csched2_cpumask, &prv->rqd[rqi].active);
>          new_cpu = cpumask_any(csched2_cpumask);
>          BUG_ON(new_cpu >= nr_cpu_ids);
>      }

So the general plan here looks right; but is there really a need to go
through the whole thing twice?  Couldn't we keep track of "rqi with
highest # cpus in soft affinity / lowest avgload" and "rqi with lowest
global avgload" in one pass, and then choose whichever one looks the
best at the end?

I think for closure sake I'm going to send this e-mail, and review the
load balancing step in another mail (which will come later this
evening).

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.