Xen project Mailing List

[Xen-users] Clock problems running RHEL6.3 PV guest

To: "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>

From: Mark Thebridge <Mark.Thebridge@xxxxxxxxxxxxxx>

Date: Mon, 11 Feb 2013 14:45:55 +0000

Accept-language: en-GB, en-US

Delivery-date: Mon, 11 Feb 2013 20:49:04 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

Thread-index: Ac4IZhrIdBftndmYT2a1NeRCHqjy8A==

Thread-topic: Clock problems running RHEL6.3 PV guest

Hi, I have a reasonably time-critical, networking application that I'm trying to get running in a Xen PV guest. Unfortunately, I'm experiencing intermittent lockups that seem to be down to poor timekeeping in the domU. The application runs on Red Hat Enterprise Linux 6.3, and so I'm using that as the domU. For dom0 I've tried both CentOS 5.9 with Xen 3.1, and Fedora 18 with Xen 4.2.1. Both have the same effect. The problem manifests as what seem to be lockups - a single vCPU appears to hang. My application has internal monitoring threads that try to determine if any part of the application has hung and these are erroneously triggering constantly. If I have a shell open to the guest, then sometimes it becomes unresponsive for a second or two. And very occasionally (maybe 3 or 4 times a day?) the kernel reports soft lockups of around 25 seconds, always with the following stack: <IRQ> [<ffffffff810d8392>] ? watchdog_timer_fn+0x1c2/0x1d0 [<ffffffff810951be>] ? __run_hrtimer+0x8e/0x1a0 [<ffffffff81007c09>] ? xen_clocksource_get_cycles+0x9/0x10 [<ffffffff81095566>] ? hrtimer_interrupt+0xe6/0x250 [<ffffffff8109570f>] ? __hrtimer_peek_ahead_timers+0x3f/0x50 [<ffffffff81095744>] ? hrtimer_peek_ahead_timers+0x24/0x40 [<ffffffff8109579b>] ? run_hrtimer_softirq+0x3b/0x40 [<ffffffff810729cb>] ? __do_softirq+0xbb/0x1f0 [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 <EOI> [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 [<ffffffff81072530>] ? ksoftirqd+0x80/0x110 [<ffffffff810724b0>] ? ksoftirqd+0x0/0x110 [<ffffffff810906d6>] ? kthread+0x96/0xa0 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 I also get regular "clocksource tsc unstable" messages in domU. If I turn on ntpd in the domU then the clock moves fast enough that NTP can't compensate. Note that the time in dom0 seems fine, and I've seen no issues there. Has anyone seen anything similar? I know this application *can* run virtualized - I have run it many times on VMware servers with no problem, and the application has been stable in Amazon EC2 as well - so I feel it must be something odd about my hypervisor/dom0 setup. But I'm stumped if I can work out what's wrong. Other potentially useful information: -- Underlying physical CPUs are fairly standard 64-bit Intels - Xeon E5645. -- There are no other domUs running, and the dom0 is doing nothing unusual. -- Other things I've tried, none of which seem to make any difference: -- Switching from xen to tsc as the clocksource in the guest (the only two available) -- Change the hypervisor command line to set clocksource=pit rather than HPET -- Boot the domU with a single vCPU -- Pinning or not pinning the vCPUs to fixed physical CPUs, both in dom0 and domU. Thanks, Mark _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.