[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] warn if time calibration goes wacko (was RE: [Xen-devel] Xen 3.2.2 - Timer ISR/0: Time went backwards)



See below.  Bad things will happen if these situations happen,
so at least we can diagnose it easier.

> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
> Sent: Wednesday, August 06, 2008 9:47 AM
> To: 'Jan Beulich'; 'Christopher S. Aker'; 'Keir Fraser'
> Cc: 'xen-devel@xxxxxxxxxxxxxxxxxxx'
> Subject: RE: [Xen-devel] Xen 3.2.2 - Timer ISR/0: Time went backwards
> 
> 
> > 20-50+% timer interrupts. The moment this rate exceeds about 50%,
> > platform time calibration breaks (as it sets the timer to 
> > half the overflow period). 
> 
> I've looked at that code in local_time_calibration() a few times
> and even added debug code once to see if it occurs.  It
> didn't on my machine, but I can see how it would cause problems
> if it did happen.
> 
> Keir, would you accept a patch (or just add the two lines yourself)
> to printk a warning if that "goto out" ever occurs and/or maybe
> if the "scale factor is clamped"?
> 
> (Chris, this might not be your problem so apologies for the topic
> drift, but if the printk had been there awhile ago, we'd at least
> know if it is or is not the problem.)
> 
> Dan
> 
> P.S. This is also what led to the separate thread about measuring
> interrupt latency.  If this problem is due to huge periods with
> interrupts off, it would be nice to know.
> 
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of 
> Jan Beulich
> > Sent: Tuesday, August 05, 2008 1:04 AM
> > To: Christopher S. Aker
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] Xen 3.2.2 - Timer ISR/0: Time went 
> backwards
> > 
> > 
> > This looks very similar to bug report we've got from IBM 
> I'm currently
> > trying to research (difficult, as I can't touch the 
> > hardware). What I know
> > so far is that we're losing, starting a few seconds after 
> > dom0 boot began,
> > 20-50+% timer interrupts. The moment this rate exceeds about 50%,
> > platform time calibration breaks (as it sets the timer to 
> > half the overflow
> > period). Since jiffies aren't used much elsewhere, this loss 
> > of timer ticks
> > doesn't seem to matter much elsewhere.
> > 
> > I've got no real clue so far *why* there's such a high rate 
> > of lost interrupts,
> > though. The only (albeit small, since appearing very 
> > unlikely) possibility
> > would be frequent and extensive SMM entries after ACPI mode got
> > enabled on the system.
> > 
> > Btw., does -unstable exhibit the same behavior?
> > 
> > Jan
> > 
> > >>> "Christopher S. Aker" <caker@xxxxxxxxxxxx> 04.08.08 20:51 >>>
> > Hardware:
> > Xen: 3.2.1-rc2 64bit
> > dom0: 2.6.18.8 at changeset 622, PAE
> > 
> > # xm dmesg | grep -e sync -e timer
> > (XEN) checking TSC synchronization across 8 CPUs: passed.
> > (XEN) Platform timer overflows in 234 jiffies.
> > (XEN) Platform timer is 3.579MHz ACPI PM Timer
> > (XEN) Machine check exception polling timer started.
> > 
> > Spools one of these to console every few seconds:
> > 
> > Timer ISR/0: Time went backwards: delta=-4270576170971 
> > delta_cpu=254829029 shadow=2037844042151244163 off=261710497 
> > processed=2037848312989081849 cpu_processed=2037844042158081849
> >   0: 2037844042158081849
> >   1: 2037828468354081849
> >   2: 2037848312989081849
> >   3: 2037837726866081849
> >   4: 2037842059197081849
> >   5: 2037840075526081849
> >   6: 2037845844663081849
> >   7: 2037841593777081849
> > 
> > A few t's into Xen's console:
> > 
> > (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to 
> > switch input 
> > to DOM0)
> > (XEN) Min = 2037829427350793281 ; Max = 2037848310626701146 
> ; Diff = 
> > 18883275907865 (18883275907 microseconds)
> > (XEN) Min = 2037829428349256182 ; Max = 2037848311625163843 
> ; Diff = 
> > 18883275907661 (18883275907 microseconds)
> > (XEN) Min = 2037829428565188930 ; Max = 2037848311841096807 
> ; Diff = 
> > 18883275907877 (18883275907 microseconds)
> > 
> > This particular box does this with 3.2.0 - 3.2.2-rc2.  I 
> have another 
> > box doing the same thing, except the delta is more sane (0 - 2 
> > microseconds), however eventually dom0 freezes.
> > 
> > -Chris
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx 
> > http://lists.xensource.com/xen-devel
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >

Attachment: timewarn.patch
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.