Hello,
I’m trying to port one of our existing appliances (running on 64-bit Red Hat Enterprise Linux 6.4) to a Xen HVM guest . We’re seeing some very odd behaviour that doesn’t manifest on other platforms.
The guest experiences intermittent lockups of a few seconds– shell sessions become unresponsive, and various software healthchecking in our application triggers, even when the application is basically idle. Our applications is time-critical
(it handles networking packets) so these delays are a major problem. There’s no pattern to when these delays happen that we can see.
Some extensive probing of the guest kernel has revealed that during the delays, the guest sees no APIC timer interrupts. This then means that kernel timers are not scheduled, causing system havoc.
I think this means that either:
·
Xen is not generating the emulated APIC timer interrupts for several seconds
·
domU has disabled the timer interrupt (for whatever reason) for several seconds.
Neither seems very likely. We do not see this problem on a bare-metal system on the same hardware, and we don’t see it when running on other hypervisors (VMware or KVM). We have seen this on guests with single or multiple vCPUs.
Is it possible that Xen has erroneously disabled the local APIC timer? Are there any useful diagnostics we could get out of Xen?
We’re running Xen 4.2.2, dom0 is Fedora Core18 (kernel 3.6.10-4.fc18.x86_64). Guest is RHEL6.4, kernel 2.6.32-358.0.1.el6.x86_64. Hardware is a Dell PowerEdge R620 server with 2 Intel Xeon E5-2690 6-core CPUs.
Thanks,
Mark
Mark Thebridge
Software Engineer
Metaswitch Networks
mark.thebridge@xxxxxxxxxxxxxx
+44 (0)2083661177
www.metaswitch.com