[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 10/16] xen: sched: use soft-affinity instead of domain's node-affinity



On 15/11/13 00:39, Dario Faggioli wrote:
On gio, 2013-11-14 at 15:30 +0000, George Dunlap wrote:
On 13/11/13 19:12, Dario Faggioli wrote:
[..]
The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

Cool, thanks.

Just a few things to note below...

Ok.

diff --git a/xen/common/domain.c b/xen/common/domain.c
@@ -411,8 +411,6 @@ void domain_update_node_affinity(struct domain *d)
                   node_set(node, d->node_affinity);
       }
- sched_set_node_affinity(d, &d->node_affinity);
-
       spin_unlock(&d->node_affinity_lock);
At this point, the only thing inside the spinlock is contingent on
d->auto_node_affinity.

Mmm... Sorry, but I'm not geting what you mean here. :-(

I mean just what I said -- if d->auto_node_affinity is false, nothing inside the critical region here needs to be done. I'm just pointing it out. :-) (This is sort of related to my comment on the other patch, about not needing to do the work of calculating intersections.)


diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
-static inline int __vcpu_has_node_affinity(const struct vcpu *vc,
+static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
                                              const cpumask_t *mask)
   {
-    const struct domain *d = vc->domain;
-    const struct csched_dom *sdom = CSCHED_DOM(d);
-
-    if ( d->auto_node_affinity
-         || cpumask_full(sdom->node_affinity_cpumask)
-         || !cpumask_intersects(sdom->node_affinity_cpumask, mask) )
+    if ( cpumask_full(vc->cpu_soft_affinity)
+         || !cpumask_intersects(vc->cpu_soft_affinity, mask) )
           return 0;
At this point we've lost a way to make this check potentially much
faster (being able to check auto_node_affinity).

Right.

This isn't a super-hot
path but it does happen fairly frequently --

Quite frequently indeed.

will the "cpumask_full()"
check take a significant amount of time on, say, a 4096-core system?  If
so, we might think about "caching" the results of cpumask_full() at some
point.

Yes, I think cpumask_* operation could be heavy when the number of pcpus
is high. However, this is not really a problem introduced by this
series. Consider that the default behavior (for libxl and xl) is to go
through initial domain placement, which would set a node-affinity for
the domain explicitly, which means d->auto_node_affinity is false.

In fact, every domain that does not manually pin its vcpus at creation
time --which is what we want, because that way NUMA placement can do its
magic-- will have to go through the (cpumask_full || !cpumask_intrscts)
anyway. Basically, I'm saying that having d->auto_node_affinity there
may look like a speedup, but it really is only for a minority of cases.

So, yes, I think we should aim at optimizing this, but that is something
completely orthogonal to this series. That is to say: (a) we should do
it anyway, whether or not this series goes in; (b) for that same reason,
that shouldn't prevent this series from going in.

If you think this can be an issue for 4.4, I'm fine creating a bug for
it and putting it among the blockers. At that point, I'll start looking
for a solution, and will commit to post a fix ASAP, but again, that's
pretty independent from this very series, at least AFAICT.

Then, the fact that you provided your Reviewed-by above probably means
that you are aware and ok with this all, but I felt like it was worth
pointing it out anyway. :-)

Yes, the "at some point" was intended to imply that I didn't think this had to be done right away, as was "things to note", which means, "I just want to point this out, they're not something which needs to be acted on right away."

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.