Xen project Mailing List

[Xen-devel] [PATCH] xen: sched_credit: filter node-affinity mask against online cpus

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Fri, 13 Sep 2013 18:09:07 +0200

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Fri, 13 Sep 2013 16:09:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

in _csched_cpu_pick(), as not doing so may result in the domain's node-affinity mask (as retrieved by csched_balance_cpumask() ) and online mask (as retrieved by cpupool_scheduler_cpumask() ) having an empty intersection. Therefore, when attempting a node-affinity load balancing step and running this: ... /* Pick an online CPU from the proper affinity mask */ csched_balance_cpumask(vc, balance_step, &cpus); cpumask_and(&cpus, &cpus, online); ... we end up with an empty cpumask (in cpus). At this point, in the following code: .... /* If present, prefer vc's current processor */ cpu = cpumask_test_cpu(vc->processor, &cpus) ? vc->processor : cpumask_cycle(vc->processor, &cpus); .... an ASSERT (from inside cpumask_cycle() ) triggers like this: (XEN) Xen call trace: (XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652 (XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10 (XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e (XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287 (XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4 (XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1 (XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab (XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b (XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 It is for example sufficient to have a domain with node-affinity to NUMA node 1 running, and issueing a `xl cpupool-numa-split' would make the above happen. That is because, by default, all the existing domains remain assigned to the first cpupool, and it now (after the cpupool-numa-split) only includes NUMA node 0. This change prevents that by generalizing the function used for figuring out whether a node-affinity load balancing step is legit or not. This way we can, in _csched_cpu_pick(), figure out early enough that the mask would end up empty, skip the step all together and avoid the splat. Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx> Cc: Jan Beulich <JBeulich@xxxxxxxx> Cc: Keir Fraser <keir@xxxxxxx> --- This was originally submitted (yesterday) as "xen: make sure the node-affinity is always updated". After discussing it with George, I got convinced that a fix like this is more appropriate, since it deals with the issue in the scheduler, without making a mess out of the user interface. This is therefore not being posted as a real v2 of that patch because the approach radically changed. --- xen/common/sched_credit.c | 41 ++++++++++++++++++++++++++++++----------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index dbe6de6..d76cd88 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -296,15 +296,25 @@ static void csched_set_node_affinity( * vcpu-affinity balancing is always necessary and must never be skipped. * OTOH, if a domain's node-affinity is said to be automatically computed * (or if it just spans all the nodes), we can safely avoid dealing with - * node-affinity entirely. Ah, node-affinity is also deemed meaningless - * in case it has empty intersection with the vcpu's vcpu-affinity, as it - * would mean trying to schedule it on _no_ pcpu! + * node-affinity entirely. + * + * Node-affinity is also deemed meaningless in case it has empty + * intersection with mask, to cover the cases where using the node-affinity + * mask seems legit, but would instead led to trying to schedule the vcpu + * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu's + * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus + * in the domain's cpupool. */ -#define __vcpu_has_node_affinity(vc) \ - ( !(cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) \ - || !cpumask_intersects(vc->cpu_affinity, \ - CSCHED_DOM(vc->domain)->node_affinity_cpumask) \ - || vc->domain->auto_node_affinity == 1) ) +static inline int __vcpu_has_node_affinity(struct vcpu *vc, cpumask_t *mask) +{ + if ( vc->domain->auto_node_affinity == 1 + || cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) + || !cpumask_intersects(CSCHED_DOM(vc->domain)->node_affinity_cpumask, + mask) ) + return 0; + + return 1; +} /* * Each csched-balance step uses its own cpumask. This function determines @@ -393,7 +403,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) int new_idlers_empty; if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(new->vcpu) ) + && !__vcpu_has_node_affinity(new->vcpu, + new->vcpu->cpu_affinity) ) continue; /* Are there idlers suitable for new (for this balance step)? */ @@ -627,10 +638,18 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) int balance_step; online = cpupool_scheduler_cpumask(vc->domain->cpupool); + cpumask_and(&cpus, vc->cpu_affinity, online); + for_each_csched_balance_step( balance_step ) { + /* + * We filter the node-affinity mask against + * [vc->cpu_affinity && online] here to avoid problems if all + * the cpus in in the node-affinity mask are offline (e.g., + * because the domain moved to a different cpupool). + */ if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc) ) + && !__vcpu_has_node_affinity(vc, &cpus) ) continue; /* Pick an online CPU from the proper affinity mask */ @@ -1445,7 +1464,7 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * or counter. */ if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY - && !__vcpu_has_node_affinity(vc) ) + && !__vcpu_has_node_affinity(vc, vc->cpu_affinity) ) continue; csched_balance_cpumask(vc, balance_step, csched_balance_mask); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.