[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 27.09.2010 16:06, Bruce Edge wrote:
I saw reproducible hangs in dom0 when the system is under heavy load.
four dom0s share a nfs server for domU images. a total number of 24 domUs
domUs on each dom0). When the system under heavy load, busy processing
e-commerce requests, one or two of the dom0s hanged. no input can be
accepted and reboot is necessary.
Anyone had the same experience? The causes I can come up are following:
Please post your hardware (mainboard, chipset, CPU, RAID controller).
I have found a severe problem on Lynnfield systems.
Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel is
causing grief for us too.  I was wondering if this was related.

I am still researching this. For testing I bought a test system with Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while Intel still lists it as having the C6 errata. This leads me to the conclusion that the HPET timer migration code (called HPET broadcast) from Xen is the root cause. This affects all CPUs that use it - but mainly Nehalem because of turbo mode.

Regards Andreas

