Xen project Mailing List

[Xen-devel] [PATCH] tools: avoid over-commitment if numa=on

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

From: Andre Przywara <andre.przywara@xxxxxxx>

Date: Mon, 30 Nov 2009 16:40:48 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Papagiannis Anastasios <apapag@xxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>

Delivery-date: Mon, 30 Nov 2009 07:42:06 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Jan Beulich wrote:

Andre Przywara <andre.przywara@xxxxxxx> 09.11.09 16:02 >>>

BTW: Shouldn't we set finally numa=on as the default value?


I'd say no, at least until the default confinement of a guest to a single
node gets fixed to properly deal with guests having more vCPU-s than
a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
not overcommitting CPUs outweigh the drawbacks of cross-node memory
accesses at the very least for CPU-bound workloads).

That sounds reasonable.

Attached a patch to lift the restriction of one node per guest if the

number of VCPUs is greater than the number of cores / node.

This isn't optimal (the best way would be to inform the guest about it,

but this is another patchset ;-), but should solve the above concerns.

Please apply, Andre. Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx> -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 ----to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632

# HG changeset patch # User Andre Przywara <andre.przywara@xxxxxxx> # Date 1259594006 -3600 # Node ID bdf4109edffbcc0cbac605a19d2fd7a7459f1117 # Parent abc6183f486e66b5721dbf0313ee0d3460613a99 allocate enough NUMA nodes for all VCPUs If numa=on, we constrain a guest to one node to keep it's memory accesses local. This will hurt performance if the number of VCPUs is greater than the number of cores per node. We detect this case now and allocate further NUMA nodes to allow all VCPUs to run simultaneously. Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx> diff -r abc6183f486e -r bdf4109edffb tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Mon Nov 30 10:58:23 2009 +0000 +++ b/tools/python/xen/xend/XendDomainInfo.py Mon Nov 30 16:13:26 2009 +0100 @@ -2637,8 +2637,7 @@ nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i])) else: nodeload[i] = sys.maxint - index = nodeload.index( min(nodeload) ) - return index + return map(lambda x: x[0], sorted(enumerate(nodeload), key=lambda x:x[1])) info = xc.physinfo() if info['nr_nodes'] > 1: @@ -2648,8 +2647,15 @@ for i in range(0, info['nr_nodes']): if node_memory_list[i] >= needmem and len(info['node_to_cpu'][i]) > 0: candidate_node_list.append(i) - index = find_relaxed_node(candidate_node_list) - cpumask = info['node_to_cpu'][index] + best_node = find_relaxed_node(candidate_node_list)[0] + cpumask = info['node_to_cpu'][best_node] + cores_per_node = info['nr_cpus'] / info['nr_nodes'] + nodes_required = (self.info['VCPUs_max'] + cores_per_node - 1) / cores_per_node + if nodes_required > 1: + log.debug("allocating %d NUMA nodes", nodes_required) + best_nodes = find_relaxed_node(filter(lambda x: x != best_node, range(0,info['nr_nodes']))) + for i in best_nodes[:nodes_required - 1]: + cpumask = cpumask + info['node_to_cpu'][i] for v in range(0, self.info['VCPUs_max']): xc.vcpu_setaffinity(self.domid, v, cpumask) return index

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.