[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] xen: credit2: limit the max number of CPUs in a runqueue



On 29.04.2020 19:36, Dario Faggioli wrote:
> @@ -852,14 +862,61 @@ cpu_runqueue_match(const struct csched2_runqueue_data 
> *rqd, unsigned int cpu)
>             (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
>  }
>  
> +/* Additional checks, to avoid separating siblings in different runqueues. */
> +static bool
> +cpu_runqueue_smt_match(const struct csched2_runqueue_data *rqd, unsigned int 
> cpu)
> +{
> +    unsigned int nr_sibl = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
> +    unsigned int rcpu, nr_smts = 0;
> +
> +    /*
> +     * If we put the CPU in this runqueue, we must be sure that there will
> +     * be enough room for accepting its hyperthread sibling(s) here as well.
> +     */
> +    cpumask_clear(cpumask_scratch_cpu(cpu));
> +    for_each_cpu ( rcpu, &rqd->active )
> +    {
> +        ASSERT(rcpu != cpu);
> +        if ( !cpumask_test_cpu(rcpu, cpumask_scratch_cpu(cpu)) )
> +        {
> +            /*
> +             * For each CPU already in the runqueue, account for it and for
> +             * its sibling(s), independently from whether such sibling(s) are
> +             * in the runqueue already or not.
> +             *
> +             * Of course, if there are sibling CPUs in the runqueue already,
> +             * only count them once.
> +             */
> +            cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                       per_cpu(cpu_sibling_mask, rcpu));
> +            nr_smts += nr_sibl;

This being common code, is it appropriate to assume all CPUs having
the same number of siblings? Even beyond that, iirc the sibling mask
represents the online or parked siblings, but not offline ones. For
the purpose here, don't you rather care about the full set?

What about HT vs AMD Fam15's CUs? Do you want both to be treated
the same here?

Also could you outline the intentions with this logic in the
description, to be able to match the goal with what gets done?

> +        }
> +    }
> +    /*
> +     * We know that neither the CPU, nor any of its sibling are here,
> +     * or we wouldn't even have entered the function.
> +     */
> +    ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                               per_cpu(cpu_sibling_mask, cpu)));
> +
> +    /* Try adding CPU and its sibling(s) to the count and check... */
> +    nr_smts += nr_sibl;
> +
> +    if ( nr_smts <= opt_max_cpus_runqueue )
> +        return true;
> +
> +    return false;

Fold these into

    return nr_smts <= opt_max_cpus_runqueue;

?

> @@ -873,11 +930,44 @@ cpu_add_to_runqueue(struct csched2_private *prv, 
> unsigned int cpu)
>          if ( !rqi_unused && rqd->id > rqi )
>              rqi_unused = true;
>  
> -        if ( cpu_runqueue_match(rqd, cpu) )
> +        /*
> +         * Check whether the CPU should (according to the topology) and also
> +         * can (if we there aren't too many already) go in this runqueue.

Nit: Stray "we"?

> +         */
> +        if ( rqd->refcnt < opt_max_cpus_runqueue &&
> +             cpu_runqueue_match(rqd, cpu) )
>          {
> -            rqd_valid = true;
> -            break;
> +            cpumask_t *siblings = per_cpu(cpu_sibling_mask, cpu);
> +
> +            dprintk(XENLOG_DEBUG, "CPU %d matches runq %d, cpus={%*pbl} (max 
> %d)\n",
> +                    cpu, rqd->id, CPUMASK_PR(&rqd->active),
> +                    opt_max_cpus_runqueue);
> +
> +            /*
> +             * If we're using core (or socket!) scheduling, or we don't have
> +             * hyperthreading, no need to do any further checking.
> +             *
> +             * If no (to both), but our sibling is already in this runqueue,
> +             * then it's also ok for the CPU to stay in this runqueue..

Nit: Stray full 2nd stop?

> +             * Otherwise, do some more checks, to better account for SMT.
> +             */
> +            if ( opt_sched_granularity != SCHED_GRAN_cpu ||
> +                 cpumask_weight(siblings) <= 1 ||
> +                 cpumask_intersects(&rqd->active, siblings) )
> +            {
> +                dprintk(XENLOG_DEBUG, "runq %d selected\n", rqd->id);
> +                rqd_valid = rqd;
> +                break;
> +            }
> +            else if ( cpu_runqueue_smt_match(rqd, cpu) )
> +            {
> +                dprintk(XENLOG_DEBUG, "considering runq %d...\n", rqd->id);
> +                rqd_valid = rqd;
> +            }
>          }
> +     else

Hard tab slipped in.

> @@ -900,6 +990,12 @@ cpu_add_to_runqueue(struct csched2_private *prv, 
> unsigned int cpu)
>          rqd->pick_bias = cpu;
>          rqd->id = rqi;
>      }
> +    else
> +        rqd = rqd_valid;
> +
> +    printk(XENLOG_INFO "CPU %d (sibling={%*pbl}) will go to runqueue %d with 
> {%*pbl}\n",
> +           cpu, CPUMASK_PR(per_cpu(cpu_sibling_mask, cpu)), rqd->id,
> +           CPUMASK_PR(&rqd->active));

Iirc there's one per-CPU printk() already. On large systems this isn't
very nice, so I'd like to ask that their total number at least not get
further grown. Ideally there would be a less verbose summary after all
CPUs have been brought up at boot, with per-CPU info be logged only
during CPU hot online.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.