[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit



On 02.07.2019 10:44, Juergen Gross wrote:
> On 02.07.19 10:27, Jan Beulich wrote:
>> On 02.07.2019 10:14, Juergen Gross wrote:
>>> On 02.07.19 09:54, Jan Beulich wrote:
>>>> And again - if someone pins every vCPU to a single pCPU, that last
>>>> such pinning operation will be what takes long term effect. Aiui all
>>>> vCPU-s in the unit will then be pinned to that one pCPU, i.e.
>>>> they'll either all compete for the one pCPU's time, or only one of
>>>> them will ever get scheduled.
>>>
>>> No, that's not how it works. Lets say we have a system with the
>>> following topology and core scheduling active:
>>>
>>> cpu0: core 0, thread 0
>>> cpu1: core 0, thread 1
>>> cpu2: core 1, thread 0
>>> cpu3: core 1, thread 1
>>>
>>> Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
>>> while any odd numbered vcpu will only run on cpu1 or cpu3.
>>>
>>> So virtual cores get scheduled on physical cores. Virtual thread 0 will
>>> only run on physical thread 0 and the associated virtual thread 1 will
>>> run on the associated physical thread 1 of the same physical core.
>>>
>>> Pinning a virtual thread 1 to a physical thread 0 is not possible (in
>>> reality only the virtual core is being pinned to the physical core).
>>
>> But that's what existing guests may be doing. You may want to
>> take a look at our old, non-pvops kernels, in particular the
>> functionality provided by their drivers/xen/core/domctl.c. Yes,
>> {sys,dom}ctl-s aren't supposed to be used by the kernel, but to
>> achieve the intended effects I saw no way around (ab)using them.
>> (I mean this to be taken as an example only - I realize that the
>> code there wouldn't work on modern Xen without updating, due to
>> the sysctl/domctl interface version that needs setting.)
> 
> First - speaking of "guests" is a little bit misleading here. This is
> restricted to dom0.
> 
> So when you want to use such a dom0 kernel with Xen 4.13 or later you'd
> need to stay with cpu scheduling. Any recent kernel will run just fine
> as dom0 with core scheduling active.

Right, but such recent kernels have (afaict) no solution to some of
the problems that the (ab)use of the {sys,dom}ctl-s were meant to
address.

>>>>>>>>> --- a/xen/include/xen/sched-if.h
>>>>>>>>> +++ b/xen/include/xen/sched-if.h
>>>>>>>>> @@ -438,11 +438,11 @@ static inline cpumask_t* 
>>>>>>>>> cpupool_domain_cpumask(struct domain *d)
>>>>>>>>>       * * The hard affinity is not a subset of soft affinity
>>>>>>>>>       * * There is an overlap between the soft and hard affinity masks
>>>>>>>>>       */
>>>>>>>>> -static inline int has_soft_affinity(const struct vcpu *v)
>>>>>>>>> +static inline int has_soft_affinity(const struct sched_unit *unit)
>>>>>>>>>      {
>>>>>>>>> -    return v->soft_aff_effective &&
>>>>>>>>> -           !cpumask_subset(cpupool_domain_cpumask(v->domain),
>>>>>>>>> -                           v->cpu_soft_affinity);
>>>>>>>>> +    return unit->soft_aff_effective &&
>>>>>>>>> +           
>>>>>>>>> !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
>>>>>>>>> +                           unit->cpu_soft_affinity);
>>>>>>>>>      }
>>>>>>>>
>>>>>>>> Okay, at the moment there looks to be a 1:1 relationship between sched
>>>>>>>> units and vCPU-s. This would - at this point of the series - 
>>>>>>>> invalidate most
>>>>>>>> my earlier comments. However, in patch 57 I don't see how this 
>>>>>>>> unit->vcpu
>>>>>>>> mapping would get broken, and I can't seem to identify any other patch
>>>>>>>> where this might be happening. Looking at the github branch I also get 
>>>>>>>> the
>>>>>>>> impression that the struct vcpu * pointer out of struct sched_unit 
>>>>>>>> survives
>>>>>>>> until the end of the series, which doesn't seem right to me.
>>>>>>>
>>>>>>> It is right. The vcpu pointer in the sched_unit is pointing to the first
>>>>>>> vcpu of the unit at the end of the series. Further vcpus are found via
>>>>>>> v->next_in_list.
>>>>>>
>>>>>> I'm afraid this sets us up for misunderstanding and misuse. I don't
>>>>>> think there should be a straight struct vcpu * out of struct sched_unit.
>>>>>
>>>>> That was the most effective way to do it. What are you suggesting?
>>>>
>>>> An actual list, i.e. with a struct list_head. That'll make obvious that
>>>> more than one vCPU might be associated with a unit. That's even more so
>>>> that the ability to associate more than one appears only quite late in
>>>> the series, i.e. there may be further instances like the code above, and
>>>> it would require a careful audit (rather than the compiler finding such
>>>> instance) to determine all places where using the first vCPU in a unit
>>>> isn't really what was meant.
>>>
>>> TBH I don't see how this would help at all.
>>>
>>> Instead of using the vcpu pointer I'd had to use the list head instead.
>>> Why is that different to a plain pointer regarding finding the places
>>> where using the first vcpu was wrong?
>>
>> Take the example above: Is it correct to act on just the first vCPU?
>> I guess _here_ it is, but the same pattern could be found elsewhere.
>> If, from the beginning, you use a clearly identifiable list construct,
>> then it'll be obvious to you as the writer and to reviewers that by
>> the end of the series there may be multiple entities that need dealing
>> with - we'd see list_first*() or for_each*() constructs right away
>> (and you wouldn't be able to circumvent their use in a way that
>> wouldn't trigger "don't open-code" comments).
> 
> Would you be fine with just renaming the pointer to "vcpu_list"?
> This would avoid the need to introduce another vcpu list in struct vcpu
> and I could re-use the already existing list as today.

Well, yes, this would at least make obvious at use sites that there
may be more than one of them.

> It should be noted that I named the pointer just "vcpu" in order to
> avoid lots of reformatting due to longer lines, though.

I can appreciate this aspect, but as said I'm afraid it would be
misleading (and potentially inviting for mis-use).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.