[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-changelog] [xen-3.4-testing] Fix hypervisor crash with unpopulated NUMA nodes



# HG changeset patch
# User Keir Fraser <keir.fraser@xxxxxxxxxx>
# Date 1255679534 -3600
# Node ID 9086d7e380e92c7af904e4bfafbb524c55b5ed09
# Parent  bd411fb0b54ae6c9347f90be00f93ecd78143bb0
Fix hypervisor crash with unpopulated NUMA nodes

On NUMA systems with memory-less nodes Xen crashes quite early in the
hypervisor (while initializing the heaps). This is not an issue if
this happens to be the last node, but "inner" nodes trigger this
reliably.  On multi-node processors it is much more likely to leave a
node unequipped.  The attached patch fixes this by enumerating the
node via the node_online_map instead of counting from 0 to num_nodes.

The resulting NUMA setup is still somewhat strange, but at least it
does not crash. In lowlevel/xc/xc.c there is again this enumeration
bug, but I suppose we cannot access the HV's node_online_map from this
context, so the xm info output is not correct (but xm debug-keys H
is).  I plan to rework the handling of memory-less nodes later.

Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
xen-unstable changeset:   20290:42a53969eb7e
xen-unstable date:        Wed Oct 07 15:58:26 2009 +0100
---
 xen/common/page_alloc.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff -r bd411fb0b54a -r 9086d7e380e9 xen/common/page_alloc.c
--- a/xen/common/page_alloc.c   Fri Oct 16 08:51:36 2009 +0100
+++ b/xen/common/page_alloc.c   Fri Oct 16 08:52:14 2009 +0100
@@ -347,7 +347,6 @@ static struct page_info *alloc_heap_page
         node = cpu_to_node(smp_processor_id());
 
     ASSERT(node >= 0);
-    ASSERT(node < num_nodes);
     ASSERT(zone_lo <= zone_hi);
     ASSERT(zone_hi < NR_ZONES);
 
@@ -376,8 +375,9 @@ static struct page_info *alloc_heap_page
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         /* Pick next node, wrapping around if needed. */
-        if ( ++node == num_nodes )
-            node = 0;
+        node = next_node(node, node_online_map);
+        if (node == MAX_NUMNODES)
+            node = first_node(node_online_map);
     }
 
     /* No suitable memory blocks. Fail the request. */
@@ -513,7 +513,6 @@ static void free_heap_pages(
 
     ASSERT(order <= MAX_ORDER);
     ASSERT(node >= 0);
-    ASSERT(node < num_online_nodes());
 
     for ( i = 0; i < (1 << order); i++ )
     {
@@ -866,13 +865,13 @@ static unsigned long avail_heap_pages(
 static unsigned long avail_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi, unsigned int node)
 {
-    unsigned int i, zone, num_nodes = num_online_nodes();
+    unsigned int i, zone;
     unsigned long free_pages = 0;
 
     if ( zone_hi >= NR_ZONES )
         zone_hi = NR_ZONES - 1;
 
-    for ( i = 0; i < num_nodes; i++ )
+    for_each_online_node(i)
     {
         if ( !avail[i] )
             continue;

_______________________________________________
Xen-changelog mailing list
Xen-changelog@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-changelog


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.