[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4 TSC problems


  • To: Olivier Hanesse <olivier.hanesse@xxxxxxxxx>
  • From: Keir Fraser <keir.xen@xxxxxxxxx>
  • Date: Thu, 24 Feb 2011 07:16:05 +0000
  • Cc: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Xen Users <xen-users@xxxxxxxxxxxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Mark Adams <mark@xxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 23 Feb 2011 23:17:11 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=IzeGufHHVZ9C4RYWayTpdASKyw21XXNZf2Ei4eComsWKDaoGw0Gas4lOiem0RwWyOL S0YTaxnEAoLEi0fawCvVi8MAxCn0TR6knZbAgIuKICUr5BJnxra0JuwR/CHp5E1IPPZG 728Lb+4Ki0syAGHAzv/hMFBfW8QQHoG3BM22A=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcvT8rMRfrxK+HBQuEKAIwfnCYe6wg==
  • Thread-topic: [Xen-devel] Xen 4 TSC problems

Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well
would be interesting, if you still have it installed on any of these
machines.

 -- Keir

On 23/02/2011 19:04, "Olivier Hanesse" <olivier.hanesse@xxxxxxxxx> wrote:

> I am sorry for the lack of information.
> Every domUs on the dom0 are affected by this bug at the exact same time.
> 
> And I had this bug on a dozen servers (all running on the same hw) since
> October (when I switched from Xen 3.2 to 4.0).
> 
> Regards
> 
> Olivier
> 
> Le 23/02/2011 18:19, Keir Fraser a écrit :
>> On 23/02/2011 16:16, "Dan Magenheimer"<dan.magenheimer@xxxxxxxxxx>  wrote:
>> 
>>> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or
>>> possibly a PV Linux) problem where a guest (or dom0) either ³goes out to
>>> lunch² for a long period, or some other timer gets stuck.  The ³clocksource
>>> tsc unstable² message is a side effect of this... it¹s very likely the TSC
>>> that IS stable and correct and the other clocksource (pvclock) has
>>> lost/gained
>>> 50 minutes!
>>> 
>>> Mark Adams cc¹ed and his original xen-devel posting below.  The fact that
>>> two
>>> different users (possibly on the same processor/system type?) have submitted
>>> the message with a delta so similar would lead me to believe there is some
>>> timer that is ³wrapping².  And since pvclock is usually the clocksource for
>>> dom0, and pvclock is driven!  by Xen¹s ³system time², a reasonable guess is
>>> that the timer that is wrapping is in Xen itself.
>>> 
>>> Mark¹s delta = -2999660303788 ns
>>> Your delta = -2999660334211 ns
>>> 
>>> Googling, I see the HPET wraparound is ~306 seconds and this delta is about
>>> 3000 seconds, so that may be a bad guess.
>>> 
>>> Keir, any thoughts on this?  Do you recall any post-4.0 patches that may
>>> have
>>> fixed this?
>> I've never seen a 3000s wrap, and I don't know of anything that would have
>> fixed a bug like this. If this is a Xen time wrap of some kind then it would
>> affect all running guests; it's not clear here whether only one, or all,
>> guests see the wrap.
>> 
>>   K.
>> 
>>> Thanks,
>>> Dan
>>> 
>>> References:
>>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
>>> https://lkml.org/lkml/2010/10/26/126
>>> 
>>> 
>>> From: Olivier Hanesse [mailto:olivier.hanesse@xxxxxxxxx]
>>> Sent: Wednesday, February 23, 2011 3:50 AM
>>> To: xen-devel@xxxxxxxxxxxxxxxxxx!  m; Xen Users
>>> Subject: [Xen-devel] Xen 4 TSC problems
>>> 
>>> 
>>> Hello
>>> 
>>> 
>>> 
>>> I've got an issue about time keeping with Xen 4.0 (Debian squeeze release).
>>> 
>>> 
>>> 
>>> My problem is here (hopefully I amn't the only one, so there might be a bug
>>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
>>> 
>>> After some times,  I got this error : Clocksource tsc unstable (delta =
>>> -2999660334211 ns). It has happened on several servers.
>>> 
>>> 
>>> 
>>> Looking at the output of "xm debug-key s;"
>>> 
>>> 
>>> 
>>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
>>> warp=2850
>>> (count=3)
>>> 
>>> 
>>> 
>>> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has the
>>> "constant_tsc", but not the "nonstop_tsc" one.
>>> 
>>> On other systems with a newer cpu with "nonstop_tsc", I don't have this
>>> issue
>>> (systems are running the same distros with same config).
>>> 
>>> 
>>> 
>>> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn't
>>> reliable and after some times, I will got the "50min" issue again.
>>> 
>>> 
>>> 
>>> I don't unders!  tand how a system can do a jump of "50min" in the future.
>>> Why
>>> 50min ? it is not 40min, not 1 hour, it is always 50min.
>>> 
>>> I don't know how to make my TSC "reliable" (I already disable everything
>>> about
>>> Powerstate in BIOS Settings).
>>> 
>>> 
>>> 
>>> Any ideas ?
>>> 
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> Olivier
>>> 
>> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.