[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Clock problems running RHEL6.3 PV guest


  • To: "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>
  • From: Mark Thebridge <Mark.Thebridge@xxxxxxxxxxxxxx>
  • Date: Mon, 11 Feb 2013 14:45:55 +0000
  • Accept-language: en-GB, en-US
  • Delivery-date: Mon, 11 Feb 2013 20:49:04 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>
  • Thread-index: Ac4IZhrIdBftndmYT2a1NeRCHqjy8A==
  • Thread-topic: Clock problems running RHEL6.3 PV guest

Hi,

I have a reasonably time-critical, networking application that I'm trying to 
get running in a Xen PV guest.   Unfortunately, I'm experiencing intermittent 
lockups that seem to be down to poor timekeeping in the domU.

The application runs on Red Hat Enterprise Linux 6.3, and so I'm using that as 
the domU.  For dom0 I've tried both CentOS 5.9 with Xen 3.1, and Fedora 18 with 
Xen 4.2.1.  Both have the same effect.

The problem manifests  as what seem to be lockups - a single vCPU appears to 
hang.   My application has internal monitoring threads that try to determine if 
any part of the application has hung and these are erroneously triggering 
constantly.  If I have a shell open to the guest, then sometimes it becomes 
unresponsive for a second or two.  And very occasionally (maybe 3 or 4 times a 
day?) the kernel reports soft lockups of around 25 seconds, always with the 
following stack:

<IRQ>  [<ffffffff810d8392>] ? watchdog_timer_fn+0x1c2/0x1d0 
[<ffffffff810951be>] ? __run_hrtimer+0x8e/0x1a0 [<ffffffff81007c09>] ? 
xen_clocksource_get_cycles+0x9/0x10
[<ffffffff81095566>] ? hrtimer_interrupt+0xe6/0x250 [<ffffffff8109570f>] ? 
__hrtimer_peek_ahead_timers+0x3f/0x50
[<ffffffff81095744>] ? hrtimer_peek_ahead_timers+0x24/0x40
[<ffffffff8109579b>] ? run_hrtimer_softirq+0x3b/0x40 [<ffffffff810729cb>] ? 
__do_softirq+0xbb/0x1f0 [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 <EOI>  
[<ffffffff8100de05>] ? do_softirq+0x65/0xa0 [<ffffffff81072530>] ? 
ksoftirqd+0x80/0x110 [<ffffffff810724b0>] ? ksoftirqd+0x0/0x110 
[<ffffffff810906d6>] ? kthread+0x96/0xa0 [<ffffffff8100c0ca>] ? 
child_rip+0xa/0x20 [<ffffffff8100b294>] ? int_ret_from_sys_call+0x7/0x1b 
[<ffffffff8100ba1d>] ? retint_restore_args+0x5/0x6 [<ffffffff8100c0c0>] ? 
child_rip+0x0/0x20

I also get regular "clocksource tsc unstable" messages in domU.    If I turn on 
ntpd in the domU then the clock moves fast enough that NTP can't compensate.
Note that the time in dom0 seems fine, and I've seen no issues there.

Has anyone seen anything similar?   I know this application *can* run 
virtualized  - I have run it many times on VMware servers with no problem, and 
the application has been stable in Amazon EC2 as well - so I feel it must be 
something odd about my hypervisor/dom0 setup.  But I'm stumped if I can work 
out what's wrong.

Other potentially useful information:
-- Underlying physical CPUs are fairly standard 64-bit Intels -  Xeon E5645.
-- There are no other domUs running, and the dom0 is doing nothing unusual.
-- Other things I've tried, none of which seem to make any difference:
  -- Switching from xen to tsc as the clocksource in the guest (the only two 
available)
  -- Change the hypervisor command line to set clocksource=pit rather than HPET
  -- Boot the domU with a single vCPU
  -- Pinning or not pinning the vCPUs to fixed physical CPUs, both in dom0 and 
domU.

Thanks,
Mark

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.