[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity



Thanks to all for the comments! I've implemented most of the changes
recommended here in the v2 review. I should have a new patch set ready
this week (with an updated soft affinity patch as well). I'll just
address the comments where you asked for my feedback...

>>              rqd_avgload = rqd->b_avgload - svc->avgload;
>>          }
>>          else if ( spin_trylock(&rqd->lock) )
>>          {
>> +            if ( !cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
>> +            {
>> +                spin_unlock(&rqd->lock);
>> +                continue;
>> +            }
>>              rqd_avgload = rqd->b_avgload;
>>              spin_unlock(&rqd->lock);
>>          }
>>
> Something like this would be easier to read, IMO:
>
>            if ( rqd == svc->rqd )
>            {
>                if ( cpumask_intersects(vc->cpu_hard_affinity,
>                                        &rqd->active) )
>                    rqd_avgload = rqd->b_avgload - svc->avgload;
>            }
>            else if ( spin_trylock(&rqd->lock) )
>            {
>                if ( cpumask_intersects(vc->cpu_hard_affinity,
>                                        &rqd->active) )
>                    rqd_avgload = rqd->b_avgload;
>
>                spin_unlock(rqd->lock);
>            }
>            else
>                continue;
>
> Semantic should be the same, provided we initialize rqd_avgload to
> MAX_LOAD, I would say.
>
> In fact, without the two 'continue;' statements you were introducing,
> we'll execute the if() that follows this block even if there was no
> intersection with the hard affinity mask but, in that case, no chance we
> have updated rqd_avgload, so it really should behave the same, what do
> you think?

I like it. I implemented your changes along with initializing
rqd_avgload to MAX_LOAD. I think it's definitely easier to read
without the continue statements and without multiple spin_unlock
statements.

>> @@ -1396,6 +1459,15 @@ csched2_vcpu_migrate(
>>
>>      /* Check if new_cpu is valid */
>>      BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
>> +    BUG_ON(!cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
>>
> Not sure.. An ASSERT() seems more suitable than a BUG_ON here... What's
> the reasoning behind this?

I was just trying to identify a state that this function should never
be in. After reading through the discussion in the review I understand
that ASSERT is more appropriate. It's changed for v3.

> Oh, and this is what was causing you troubles, in case source and
> destination runqueue were the same... Help me understand, which call to
> sched_move_irqs() in schedule.c were we missing? I'd say it is the one
> in vcpu_migrate(), but that does not seem to care about vc->processor
> (much rater about new_cpu)... what am I missing?
>
> However, if they are not the same, the call to migrate() will override
> this right away, won't it? What I mean to say is, if this is required
> only in case trqd and svc->rqd are equal, can we tweak this part of
> csched2_vcpu_migrate() as follows?
>
>     if ( trqd != svc->rqd )
>         migrate(ops, svc, trqd, NOW());
>     else
>         vc->processor = new_cpu;

You are right; it does not have anything to do with sched_move_irqs()
not being called (like you said it doesn't care about vc->processor).
You are never going to believe any of my explanations now! :) Well
third time's a charm hopefully... Without the processor assignment
here the vcpu might go on being assigned to a processor it no longer
is allowed to run on. In that case, function runq_candidate may only
get called for the vcpu's old processor, and runq_candidate will no
longer let a vcpu run on a processor that it's not allowed to run on
(because of the hard affinity check first introduced in v1 of this
patch). So in that condition the vcpu never gets to run. That's still
somewhat of a vague explanation, but I have observed that that is what
happens. Anyway I think everyone agrees that the processor assignment
needs to be here, and I did move it to an else block for v3 like you
recommended above.

>> >> +static int get_safe_pcpu(struct csched2_vcpu *svc)
>> >> +{
>> >>
>> > I also don't like the name... __choose_cpu() maybe ?
>>
>> I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
>> descriptive than just "__choose_cpu".
>>
> I don't like the "_safe_" part, but that is not a big deal, I certainly
> can live with it.

I changed it to _choose_valid_pcpu ... discuss! (Also, I can send out
a pre-patch to change the double underscores in the whole file)

>> >> +    cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
>> >> &svc->rqd->active);
>> >> +    if ( unlikely(cpumask_empty(csched2_cpumask)) )
>> >> +        cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
>> >> +            cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>> >>
>> > VCPU2ONLINE(svc->vcpu) would make the line shorter.

I agree. VCPU2ONLINE is defined in schedule.c ... do you want me to
move it to a common header along with the other parts we discussed
(__vcpu_has_soft_affinity, etc.)? Okay to move them to sched-if.h, or
should I put them in a new header file?

>> > Also, I know I'm the one that suggested this form for the code, but
>> > thinking again about it, I'm not sure the first if is worth:
>> >
>> >     cpumask_and(csched2_cpumask, &svc->rqd->active, 
>> > VCPU2ONLINE(svc->vcpu));
>> >     cpumask_and(csched2_cpumask, csched2_cpumask, 
>> > svc->vcpu->cpu_hard_affinity);
>>
>> Actually, I was going to say is there any reason not to start with:
>>
>> if ( likely(cpumask_test_cpu(svc->vcpu->processor,
>> svc->vcpu->cpu_hard_affinity))
>>  return svc->vcpu->processor;
>>
>> And then go through doing the unions and what-not once we've determined
>> that the current processor isn't suitable?
>>
> I like the idea, and I think it actually makes sense. I'm trying to
> think to a scenario where this can be bugous, and where we actually need
> to do the filtering against rqd->active and online up front, but I can't
> imagine one.
>
> I think the possible source of trouble is affinity being changed, but
> even then, if we were on vcpu->processor, and that still is in our hard
> affinity mask, it ought to be right to stay there (and hence George's
> suggestion should be fine)... Justin, what do you think?

I think doing "if ( likely(cpumask_test_cpu(svc->vcpu->processor,
svc->vcpu->cpu_hard_affinity) )" first is the way to go.

>> > Oh, and, with either yours or my variant... can csched2_cpumask end up
>> > empty after the two &&-s? That's important or, below, cpumask_any will
>> > return garbage! :-/
>> >
>> > From a quick inspection, it does not seem it can, in which case, I'd put
>> > down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
>> > ways of preventing that to happen before getting here... It seems easier
>> > and more correct to do that, rather than trying to figure out what to do
>> > here if the mask is empty. :-O
>>
>> Yes, we need to go through the code and make sure that we understand
>> what our assumptions are, so that people can't crash Xen by doing
>> irrational things like making the hard affinity not intersect with the
>> cpupool cpus.
>>
> True.
>
> Something like that can be figured out and either forbidden or, in
> general, addressed in other places, rather than requiring us to care
> here. In fact, this seems to me to be what's happening already (see
> below).

I don't think there's any way the mask can be empty after the
cpumask_and with the cpu_hard_affinity and the VCPU2ONLINE. In
schedule.c:vcpu_set_hard_affinity, if the intersection of the new hard
mask and VCPU2ONLINE is empty, the function returns -EINVAL. I took
into account all the discussion here and modified the function for v3.

Thank you!
Justin Weaver
University of Hawaii at Manoa

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.