[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4 TSC problems



xm dmesg : 

(XEN) Xen version 4.0.1 (Debian 4.0.1-2) (waldi@xxxxxxxxxx) (gcc version 4.4.5 (Debian 4.4.5-10) ) Wed Jan 12 14:04:06 UTC 2011
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 2 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009ac00 (usable)
(XEN)  000000000009ac00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000bffc7980 (usable)
(XEN)  00000000bffc7980 - 00000000bffcee80 (ACPI data)
(XEN)  00000000bffcee80 - 00000000c0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 00000002c0000000 (usable)
(XEN) ACPI: RSDP 000FDFD0, 0024 (r2 IBM   )
(XEN) ACPI: XSDT BFFCED40, 0054 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: FACP BFFCEC80, 0084 (r2 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: DSDT BFFC7980, 2EDA (r2 IBM    SERDEFNT     1000 INTL 20041203)
(XEN) ACPI: FACS BFFCAB00, 0040
(XEN) ACPI: APIC BFFCEB80, 00BC (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: SRAT BFFCEA00, 0128 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: HPET BFFCE9C0, 0038 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: MCFG BFFCE980, 003C (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: ERST BFFCAB40, 0230 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) System RAM: 10239MB (10485124kB)
(XEN) SRAT: PXM 0 -> APIC 0 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 1 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 2 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 3 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 4 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 5 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 6 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 7 -> Node 0
(XEN) SRAT: Node 0 PXM 0 0-c0000000
(XEN) SRAT: Node 0 PXM 0 100000000-2c0000000
(XEN) SRAT: hot plug zone found 2c0000000 - 1000000000 
(XEN) SRAT: Node 0 PXM 0 2c0000000-1000000000
(XEN) NUMA: Allocated memnodemap from 2bfdfe000 - 2bfdff000
(XEN) NUMA: Using 18 for the hash shift.
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 0009ad40
(XEN) DMI 2.4 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x588
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0]
(XEN) ACPI:                  wakeup_vec[bffcab0c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
(XEN) Processor #3 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
(XEN) Processor #4 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
(XEN) Processor #5 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
(XEN) Processor #6 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
(XEN) Processor #7 7:7 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 14, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 20
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2493.798 MHz processor.
(XEN) Initing memory sharing.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) Total of 8 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 8 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 64 KiB.
(XEN) microcode.c:73:d32767 microcode: CPU2 resumed
(XEN) microcode.c:73:d32767 microcode: CPU1 resumed
(XEN) microcode.c:73:d32767 microcode: CPU3 resumed
(XEN) Brought up 8 CPUs
(XEN) microcode.c:73:d32767 microcode: CPU4 resumed
(XEN) microcode.c:73:d32767 microcode: CPU5 resumed
(XEN) microcode.c:73:d32767 microcode: CPU6 resumed
(XEN) microcode.c:73:d32767 microcode: CPU7 resumed
(XEN) HPET: 3 timers in total, 0 timers will be used for broadcast
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x16b2000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   00000002b4000000->00000002b8000000 (114688 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff816b2000
(XEN)  Init. ramdisk: ffffffff816b2000->ffffffff82e05400
(XEN)  Phys-Mach map: ffffffff82e06000->ffffffff82f06000
(XEN)  Start info:    ffffffff82f06000->ffffffff82f064b4
(XEN)  Page tables:   ffffffff82f07000->ffffffff82f22000
(XEN)  Boot stack:    ffffffff82f22000->ffffffff82f23000
(XEN)  TOTAL:         ffffffff80000000->ffffffff83000000
(XEN)  ENTRY ADDRESS: ffffffff81502200
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM: ................................................................................................done.
(XEN) trace.c:89:d32767 calc_tinfo_first_offset: NR_CPUs 128, offset_in_bytes 258, t_info_first_offset 65
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 176kB init memory.
(XEN) PCI add device 00:00.0
(XEN) PCI add device 00:02.0
(XEN) PCI add device 00:03.0
(XEN) PCI add device 00:04.0
(XEN) PCI add device 00:05.0
(XEN) PCI add device 00:06.0
(XEN) PCI add device 00:07.0
(XEN) PCI add device 00:08.0
(XEN) PCI add device 00:10.0
(XEN) PCI add device 00:10.1
(XEN) PCI add device 00:10.2
(XEN) PCI add device 00:11.0
(XEN) PCI add device 00:13.0
(XEN) PCI add device 00:15.0
(XEN) PCI add device 00:16.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1d.0
(XEN) PCI add device 00:1d.1
(XEN) PCI add device 00:1d.2
(XEN) PCI add device 00:1d.7
(XEN) PCI add device 00:1e.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.1
(XEN) PCI add device 00:1f.3
(XEN) PCI add device 10:00.0
(XEN) PCI add device 10:00.3
(XEN) PCI add device 11:00.0
(XEN) PCI add device 11:01.0
(XEN) PCI add device 07:00.0
(XEN) PCI add device 07:00.1
(XEN) PCI add device 03:00.0
(XEN) PCI add device 04:00.0
(XEN) PCI add device 02:00.0
(XEN) PCI add device 05:00.0
(XEN) PCI add device 06:00.0
(XEN) PCI add device 01:01.0

When the issue append : 

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.

Output of xm debug-key s :

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2684 (count=4)
(XEN) dom1: mode=0,ofs=0xa8dcbfb9a,khz=2493798,inc=1,vtsc count: 1756100739 kernel, 20526533 user
(XEN) dom2: mode=0,ofs=0xc257d49df,khz=2493798,inc=1,vtsc count: 900668266 kernel, 30618121 user
(XEN) dom3: mode=0,ofs=0xdb1299744,khz=2493798,inc=1,vtsc count: 16656509047 kernel, 709406217 user
(XEN) dom4: mode=0,ofs=0xf8627e616,khz=2493798,inc=1,vtsc count: 1174828915 kernel, 194957775 user
(XEN) dom5: mode=0,ofs=0x115a0f2a67,khz=2493798,inc=1,vtsc count: 332007967 kernel, 5766769 user
(XEN) dom6: mode=0,ofs=0x13bf462f38,khz=2493798,inc=1,vtsc count: 3137076938 kernel, 1076320679 user
(XEN) dom10: mode=0,ofs=0x1b99e41f4b,khz=2493798,inc=1,vtsc count: 411433049 kernel, 19532319 user
(XEN) dom11: mode=0,ofs=0x1e4991cf40,khz=2493798,inc=1,vtsc count: 415406148 kernel, 19223482 user
(XEN) dom12: mode=0,ofs=0x1fe8c10600,khz=2493798,inc=1,vtsc count: 1012850399 kernel, 63603352 user
(XEN) dom13: mode=0,ofs=0x21ef9b9531,khz=2493798,inc=1,vtsc count: 813097186 kernel, 27536004 user
(XEN) dom14: mode=0,ofs=0x23f5b4e429,khz=2493798,inc=1,vtsc count: 2461059718 kernel, 48182776 user
(XEN) dom18: mode=0,ofs=0x2bdc302048,khz=2493798,inc=1,vtsc count: 624333824 kernel, 5166805 user
(XEN) dom19: mode=0,ofs=0x2e67227085,khz=2493798,inc=1,vtsc count: 1037952789 kernel, 5778635 user
(XEN) dom20: mode=0,ofs=0x562ce020eea4,khz=2493798,inc=1,vtsc count: 643491360 kernel, 31771029 user
(XEN) dom21: mode=0,ofs=0x563a017eea82,khz=2493798,inc=1,vtsc count: 715148727 kernel, 24430809 user
(XEN) dom25: mode=0,ofs=0x1d0c5230cdfad,khz=2493798,inc=1,vtsc count: 2103227324 kernel, 656635140 user
(XEN) dom27: mode=0,ofs=0x1d868b8c1fbbf,khz=2493798,inc=1,vtsc count: 476542178 kernel, 12976786 user
(XEN) dom31: mode=0,ofs=0x1dc08da161ebc,khz=2493798,inc=1,vtsc count: 2747233178 kernel, 466863700 user
(XEN) dom32: mode=0,ofs=0x1ecde6eb53d2c,khz=2493798,inc=1,vtsc count: 305360096 kernel, 11705823 user
(XEN) dom33: mode=0,ofs=0x1ece1bf734f61,khz=2493798,inc=1,vtsc count: 516548852 kernel, 18662125 user


Output of xm debug-key t :

(XEN) Synced stime skew: max=1405ns avg=1405ns samples=1 current=1405ns
(XEN) Synced cycles skew: max=2377 avg=2377 samples=1 current=2377

Output of /proc/cpuinfo : 

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
stepping        : 6
cpu MHz         : 2493.798
cache size      : 6144 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc up rep_good aperfmperf pni est ssse3 cx16 sse4_1 hypervisor lahf_lm
bogomips        : 4987.59
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

Output of xm info : 

release                : 2.6.32-bpo.5-xen-amd64
version                : #1 SMP Mon Jan 17 22:05:11 UTC 2011
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2493
hw_caps                : bfebfbff:20000800:00000000:00000940:000ce3bd:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 10239
free_memory            : 910
node_to_cpu            : node0:0-7
node_to_memory         : node0:910
node_to_dma32_mem      : node0:910
max_node_id            : 0
xen_major              : 4
xen_minor              : 0
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-10) 
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Wed Jan 12 14:04:06 UTC 2011
xend_config_format     : 4

in dom0 /var/log/kern.log :

Feb 23 22:40:54 dom0 kernel: [995452.618519] Clocksource tsc unstable (delta = -2999660335950 ns)

in domU, I don't see any logs, the time just "jumps" 50min in the future (see /var/log/daemon.log)

Feb 23 21:50:51 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:58303
Feb 23 22:40:55 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:45713

Clocksource is set to "xen" to both dom0 et domU :
cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Regards

Olivier

2011/2/24 Keir Fraser <keir.xen@xxxxxxxxx>
Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well
would be interesting, if you still have it installed on any of these
machines.

 -- Keir

On 23/02/2011 19:04, "Olivier Hanesse" <olivier.hanesse@xxxxxxxxx> wrote:

> I am sorry for the lack of information.
> Every domUs on the dom0 are affected by this bug at the exact same time.
>
> And I had this bug on a dozen servers (all running on the same hw) since
> October (when I switched from Xen 3.2 to 4.0).
>
> Regards
>
> Olivier
>
> Le 23/02/2011 18:19, Keir Fraser a éit :
>> On 23/02/2011 16:16, "Dan Magenheimer"<dan.magenheimer@xxxxxxxxxx>  wrote:
>>
>>> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or
>>> possibly a PV Linux) problem where a guest (or dom0) either ³goes out to
>>> lunch² for a long period, or some other timer gets stuck.  The ³clocksource
>>> tsc unstable² message is a side effect of this... it¹s very likely the TSC
>>> that IS stable and correct and the other clocksource (pvclock) has
>>> lost/gained
>>> 50 minutes!
>>>
>>> Mark Adams cc¹ed and his original xen-devel posting below.  The fact that
>>> two
>>> different users (possibly on the same processor/system type?) have submitted
>>> the message with a delta so similar would lead me to believe there is some
>>> timer that is ³wrapping².  And since pvclock is usually the clocksource for
>>> dom0, and pvclock is driven!  by Xen¹s ³system time², a reasonable guess is
>>> that the timer that is wrapping is in Xen itself.
>>>
>>> Mark¹s delta = -2999660303788 ns
>>> Your delta = -2999660334211 ns
>>>
>>> Googling, I see the HPET wraparound is ~306 seconds and this delta is about
>>> 3000 seconds, so that may be a bad guess.
>>>
>>> Keir, any thoughts on this?  Do you recall any post-4.0 patches that may
>>> have
>>> fixed this?
>> I've never seen a 3000s wrap, and I don't know of anything that would have
>> fixed a bug like this. If this is a Xen time wrap of some kind then it would
>> affect all running guests; it's not clear here whether only one, or all,
>> guests see the wrap.
>>
>>   K.
>>
>>> Thanks,
>>> Dan
>>>
>>> References:
>>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
>>> https://lkml.org/lkml/2010/10/26/126
>>>
>>>
>>> From: Olivier Hanesse [mailto:olivier.hanesse@xxxxxxxxx]
>>> Sent: Wednesday, February 23, 2011 3:50 AM
>>> To: xen-devel@xxxxxxxxxxxxxxxxxx!  m; Xen Users
>>> Subject: [Xen-devel] Xen 4 TSC problems
>>>
>>>
>>> Hello
>>>
>>>
>>>
>>> I've got an issue about time keeping with Xen 4.0 (Debian squeeze release).
>>>
>>>
>>>
>>> My problem is here (hopefully I amn't the only one, so there might be a bug
>>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
>>>
>>> After some times,  I got this error : Clocksource tsc unstable (delta =
>>> -2999660334211 ns). It has happened on several servers.
>>>
>>>
>>>
>>> Looking at the output of "xm debug-key s;"
>>>
>>>
>>>
>>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
>>> warp=2850
>>> (count=3)
>>>
>>>
>>>
>>> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has the
>>> "constant_tsc", but not the "nonstop_tsc" one.
>>>
>>> On other systems with a newer cpu with "nonstop_tsc", I don't have this
>>> issue
>>> (systems are running the same distros with same config).
>>>
>>>
>>>
>>> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn't
>>> reliable and after some times, I will got the "50min" issue again.
>>>
>>>
>>>
>>> I don't unders!  tand how a system can do a jump of "50min" in the future.
>>> Why
>>> 50min ? it is not 40min, not 1 hour, it is always 50min.
>>>
>>> I don't know how to make my TSC "reliable" (I already disable everything
>>> about
>>> Powerstate in BIOS Settings).
>>>
>>>
>>>
>>> Any ideas ?
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Olivier
>>>
>>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.