[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit



On 02.07.19 09:54, Jan Beulich wrote:
On 02.07.2019 08:30, Juergen Gross wrote:
On 01.07.19 17:46, Jan Beulich wrote:
On 01.07.2019 17:10, Juergen Gross wrote:
On 01.07.19 16:08, Jan Beulich wrote:
On 28.05.19 at 12:32, <jgross@xxxxxxxx> wrote:
@@ -155,8 +156,8 @@ static void nmi_mce_softirq(void)
         * Set the tmp value unconditionally, so that the check in the iret
         * hypercall works.
         */
-    cpumask_copy(st->vcpu->cpu_hard_affinity_tmp,
-                 st->vcpu->cpu_hard_affinity);
+    cpumask_copy(st->vcpu->sched_unit->cpu_hard_affinity_tmp,
+                 st->vcpu->sched_unit->cpu_hard_affinity);

Aiui this affects all vCPU-s in the unit, which is unlikely to be what we
want here: There's now only one cpu_hard_affinity_tmp for all vCPU-s
in the unit, yet every vCPU in there may want to make use of the
field in parallel.

Hmm, yes, we'll need a usage bitmask.

Please note that affecting all vcpus in the unit is per design. With
multiple vcpus of a unit needing this feature in parallel there is no
way they can have different needs regarding temporary affinity.

But how will this work? I.e. how will all vCPU-s in a unit get
their temporary affinity pointing to the one specific pCPU in question?

The _unit_ is pinned, so all the vcpus in that unit are pinned, too.

Yes, but ...

It's not just the "all at the same time" that I don't see working here,
I'm also having trouble seeing how the potential cross-core or cross-
node movement that's apparently needed here would end up working. I'm

The unit is moved to another core via normal scheduling mechanisms. As
switching context is synchronized (see patch 35) all vcpus of a unit are
moved together.

... they may get pinned to different pCPU-s or all the same pCPU here.
Both cases need to work, and I'm currently not seeing how that would
be achieved.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -125,11 +125,6 @@ static void vcpu_info_reset(struct vcpu *v)
    static void vcpu_destroy(struct vcpu *v)
    {
-    free_cpumask_var(v->cpu_hard_affinity);
-    free_cpumask_var(v->cpu_hard_affinity_tmp);
-    free_cpumask_var(v->cpu_hard_affinity_saved);
-    free_cpumask_var(v->cpu_soft_affinity);
-
        free_vcpu_struct(v);
    }
@@ -153,12 +148,6 @@ struct vcpu *vcpu_create(
        grant_table_init_vcpu(v);
-    if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
-         !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
-         !zalloc_cpumask_var(&v->cpu_soft_affinity) )
-        goto fail;

Seeing these, I'm actually having trouble understanding how you mean
to retain the user visible interface behavior here: If you only store an
affinity per sched unit, then how are you meaning to honor the vCPU
granular requests coming in?

With core scheduling it is only possible to set (virtual) core
affinities. Whenever an affinity of a vcpu is being set it will set the
affinity of the whole unit.

Hmm, that's indeed what I was deducing, but how will we sell this
to people actually fiddling with vCPU affinities? I foresee getting
bug reports that the respective xl command(s) do(es)n't do anymore
what it used to do.

The new behavior must be documented, sure.

Documentation is just one aspect. Often enough people only read docs
when wanting to introduce new functionality, which I consider a fair
model. Such people will be caught by surprise that the pinning
behavior does not work the same way anymore.

And again - if someone pins every vCPU to a single pCPU, that last
such pinning operation will be what takes long term effect. Aiui all
vCPU-s in the unit will then be pinned to that one pCPU, i.e.
they'll either all compete for the one pCPU's time, or only one of
them will ever get scheduled.

No, that's not how it works. Lets say we have a system with the
following topology and core scheduling active:

cpu0: core 0, thread 0
cpu1: core 0, thread 1
cpu2: core 1, thread 0
cpu3: core 1, thread 1

Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
while any odd numbered vcpu will only run on cpu1 or cpu3.

So virtual cores get scheduled on physical cores. Virtual thread 0 will
only run on physical thread 0 and the associated virtual thread 1 will
run on the associated physical thread 1 of the same physical core.

Pinning a virtual thread 1 to a physical thread 0 is not possible (in
reality only the virtual core is being pinned to the physical core).


Juergen


--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct 
domain *d)
     * * The hard affinity is not a subset of soft affinity
     * * There is an overlap between the soft and hard affinity masks
     */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_unit *unit)
    {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return unit->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
+                           unit->cpu_soft_affinity);
    }

Okay, at the moment there looks to be a 1:1 relationship between sched
units and vCPU-s. This would - at this point of the series - invalidate most
my earlier comments. However, in patch 57 I don't see how this unit->vcpu
mapping would get broken, and I can't seem to identify any other patch
where this might be happening. Looking at the github branch I also get the
impression that the struct vcpu * pointer out of struct sched_unit survives
until the end of the series, which doesn't seem right to me.

It is right. The vcpu pointer in the sched_unit is pointing to the first
vcpu of the unit at the end of the series. Further vcpus are found via
v->next_in_list.

I'm afraid this sets us up for misunderstanding and misuse. I don't
think there should be a straight struct vcpu * out of struct sched_unit.

That was the most effective way to do it. What are you suggesting?

An actual list, i.e. with a struct list_head. That'll make obvious that
more than one vCPU might be associated with a unit. That's even more so
that the ability to associate more than one appears only quite late in
the series, i.e. there may be further instances like the code above, and
it would require a careful audit (rather than the compiler finding such
instance) to determine all places where using the first vCPU in a unit
isn't really what was meant.

TBH I don't see how this would help at all.

Instead of using the vcpu pointer I'd had to use the list head instead.
Why is that different to a plain pointer regarding finding the places
where using the first vcpu was wrong?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.