[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 04 of 11 v3] xen: sched_credit: let the scheduler know about node-affinity



On Fri, Feb 1, 2013 at 11:01 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
As vcpu-affinity tells where VCPUs must run, node-affinity tells
where they should or, better, prefer. While respecting vcpu-affinity
remains mandatory, node-affinity is not that strict, it only expresses
a preference, although honouring it is almost always true that will
bring significant performances benefit (especially as compared to
not having any affinity at all).

This change modifies the VCPU load balancing algorithm (for the
credit scheduler only), introducing a two steps logic.
During the first step, we use the node-affinity mask. The aim is
giving precedence to the CPUs where it is known to be preferable
for the domain to run. If that fails in finding a valid PCPU, the
node-affinity is just ignored and, in the second step, we fall
back to using cpu-affinity only.

Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

OK, sorry it took so long.  This looks a lot better just a couple of comments below.
 
+
+/*
+ * vcpu-affinity balancing is always necessary and must never be skipped.
+ * OTOH, if a domain has affinity with all the nodes, we can tell the caller
+ * that he can safely skip the node-affinity balancing step.
+ */
+#define __vcpu_has_valuable_node_affinity(vc) \
+    ( !cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) )

Something about the name of this one just doesn't strike me right.  I might be tempted just to go with "__vcpu_has_node_affinity", and let the comment above it explain what it means for the curious.
 
+
+static inline int csched_balance_step_skippable(int step, struct vcpu *vc)
+{
+    if ( step == CSCHED_BALANCE_NODE_AFFINITY
+         && !__vcpu_has_valuable_node_affinity(vc) )
+        return 1;
+    return 0;
+}

I'm not a fan of this kind of encapsulation, but I'll let it be I guess.  You missed a place to use it however -- in csched_runq_steal() you have the if() statement.

(I prefer just having the duplicated if statements, as I think it's easier to read; but you should go all one way or the other.)
 

@@ -1277,11 +1391,21 @@ csched_runq_steal(int peer_cpu, int cpu,
             if ( speer->pri <= pri )
                 break;

-            /* Is this VCPU is runnable on our PCPU? */
+            /* Is this VCPU runnable on our PCPU? */
             vc = speer->vcpu;
             BUG_ON( is_idle_vcpu(vc) );

-            if (__csched_vcpu_is_migrateable(vc, cpu))
+            /*
+             * If the vcpu has no valuable node-affinity, skip this vcpu.
+             * In fact, what we want is to check if we have any node-affine
+             * work to steal, before starting to look at vcpu-affine work.
+             */
+            if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
+                 && !__vcpu_has_valuable_node_affinity(vc) )
+                continue;

This is where you missed the "csched_balance_step_skippable" replacement.

Just an observation -- I think this will penalize systems that do not have node affinity enabled, in that if no one is using node affinities, then what will happen is the load balancing code will go through and check each vcpu on each processor, determine that there are no node affinities, and then go through again looking at vcpu affinities.  Going through twice for a single vcpu when doing placement is probably OK; but going all the way through all vcpus once could be pretty expensive.

I think we should take this patch now (with the other minor changes mentioned above) so we can get it tested properly.  But we should probably try to do something to address this issue before the release -- maybe something like keeping a bitmask for each runqueue, saying whether any vcpus running on them have node affinity?  That way the first round we'll only check runqueues where we might conceivably steal something.

Other than that, looks good -- thanks for the good work.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.