[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()



On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>>>
>>> diff --git a/xen/common/sched_credit2.c
>>> b/xen/common/sched_credit2.c
>>> index c749d4e..54f6e21 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
>>> cpumask_t *mask)
>>>  }
>>>  
>>>  /*
>>> - * When a hard affinity change occurs, we may not be able to check
>>> some
>>> - * (any!) of the other runqueues, when looking for the best new
>>> processor
>>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
>>> happens, we
>>> - * pick, in order of decreasing preference:
>>> - *  - svc's current pcpu;
>>> - *  - another pcpu from svc's current runq;
>>> - *  - any cpu.
>>> + * In csched2_cpu_pick(), it may not be possible to actually look
>>> at remote
>>> + * runqueues (the trylock-s on their spinlocks can fail!). If that
>>> happens,
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>>> + *  3) just one valid pcpu from svc's soft affinity;
>>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard
>>> affinity;
>>> + *  6) just one valid pcpu from svc's hard affinity
>>> + *
>>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft
>>> affinity. Also
>>> + * note that at least 6 is guaranteed to _always_ return at least
>>> one pcpu.
>>>   */
>>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>>  {
>>>      struct vcpu *v = svc->vcpu;
>>> -    int cpu = v->processor;
>>> +    unsigned int bs;
>>>  
>>> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>>> -                cpupool_domain_cpumask(v->domain));
>>> +    for_each_affinity_balance_step( bs )
>>> +    {
>>> +        int cpu = v->processor;
>>> +
>>> +        if ( bs == BALANCE_SOFT_AFFINITY &&
>>> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
>>> +            continue;
>>>  
>>> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>>> -        return cpu;
>>> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>>> +        cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                    cpupool_domain_cpumask(v->domain));
>>>  
>>> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> -                                   &svc->rqd->active)) )
>>> -    {
>>> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
>>> -                    cpumask_scratch_cpu(cpu));
>>> -        return cpumask_first(cpumask_scratch_cpu(cpu));
>>> -    }
>>> +        /*
>>> +         * This is cases 1 or 4 (depending on bs): if v->processor 
>>> is (still)
>>> +         * in our affinity, go for it, for cache betterness.
>>> +         */
>>> +        if ( likely(cpumask_test_cpu(cpu,
>>> cpumask_scratch_cpu(cpu))) )
>>> +            return cpu;
>>>  
>>> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>>> +        /*
>>> +         * This is cases 2 or 5 (depending on bs): v->processor
>>> isn't there
>>> +         * any longer, check if we at least can stay in our
>>> current runq.
>>> +         */
>>> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> +                                       &svc->rqd->active)) )
>>> +        {
>>> +            cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                        &svc->rqd->active);
>>> +            return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        }
>>>  
>>> -    return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        /*
>>> +         * This is cases 3 or 6 (depending on bs): last stand,
>>> just one valid
>>> +         * pcpu from our soft affinity, if we have one and if
>>> there's any. In
>>> +         * fact, if we are doing soft-affinity, it is possible
>>> that we fail,
>>> +         * which means we stay in the loop and look for hard
>>> affinity. OTOH,
>>> +         * if we are at the hard-affinity balancing step, it's
>>> guaranteed that
>>> +         * there is at least one valid cpu, and therefore we are
>>> sure that we
>>> +         * return it, and never really exit the loop.
>>> +         */
>>> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>>> +               bs == BALANCE_SOFT_AFFINITY);
>>> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
>>
>> So just checking my understanding here... at this point we're not
>> taking
>> into consideration load or idleness or anything else -- we're just
>> saying, "Is there a cpu in my soft affinity it is *possible* to run
>> on?"
>>
> Exactly. If we are in this function, it means we failed to take the
> locks we needed, for making a choice based on load, idleness, etc, but
> we need a CPU, so we pick whatever is valid.
> 
> For choosing among all the valid ones, we act how it is explained in
> the comment.
> 
>>  So on a properly configured system, we should never take the second
>> iteration of the loop?
>>
> Mmm.. I think you're right. In fact, in a properly configured system,
> we'll never go past step 3 (from the comment at the top).
> 
> Which is not ideal, or at least not what I had in mind. In fact, I
> think it's better to check step 4 (svc->vcpu->processor in hard-
> affinity) and step 5 (a CPU from svc's runqueue in hard affinity), as
> that would mean avoiding a runqueue migration.
> 
> What about I basically kill step 3, i.e., if we reach this point during
> the soft-affinity step, I just continue to the hard-affinity one?

Hmm, well *normally* we would rather have a vcpu running within its soft
affinity, even if that means moving it to another runqueue.  Is your
idea that, the only reason we're in this particular code is because we
couldn't grab the lock we need to make a more informed decision; so
defer if possible to previous decisions, which (we might presume) was
able to make a more informed decision?

>>> +        if ( likely(cpu < nr_cpu_ids) )
>>> +            return cpu;
>>> +    }
>>> +    BUG_ON(1);
>>
>> Do we want to BUG() here?  I don't think this constitutes an
>> unrecoverable error; an ASSERT_UNREACHABLE() plus something random
>> would
>> be better, wouldn't it?
>>
> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> "something random"? The value to be assigned to cpu?

Er, yes, I meant the return value.  Returning 0, or v->processor would
be simple options.  *Really* defensive programming would attempt to
chose something somewhat sensible with the minimal risk of triggering
some other hidden assumptions (say, any cpu on our current runqueue).
But part of me says even thinking too long about it is a waste of time
for something we're 99.99% sure can never happen. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.