[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RFE: Detect NUMA misconfigurations and prevent machine freezes



When playing with NUMA support recently, I noticed a host would always hang 
when trying to create a cpupool for the second NUMA node in the system.

I was using the following commands:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 node:1

After the last command, the system would hang - requiring a hard reset of the 
machine to fix.

I tried a different variation with the same result:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 12

It turns out that the RAM was installed sub-optimally in this machine. A 
partial output from 'xl info -n' shows:
numa_info              :
node:    memsize    memfree    distances
  0:     67584      62608      10,21
  1:             0              0      21,10

A machine where we could get this working every time shows:
node:    memsize    memfree    distances
  0:     34816      30483      10,21
  1:     32768      32125      21,10

As we can deduce RAM misconfigurations in this scenario, I believe we should 
check to ensure that RAM configuration / layout is sane *before* attempting to 
split the system and print a warning.

This would prevent a hard system freeze in this scenario.

-- 
Steven Haigh

📧 netwiz@xxxxxxxxx       💻 https://www.crc.id.au
📞 +61 (3) 9001 6090    📱 0412 935 897

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.