[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Stopping much Linux testing in Xen Project CI



On 13.03.20 10:13, Jan Beulich wrote:
On 12.03.2020 18:55, Roger Pau Monné wrote:
On Thu, Mar 12, 2020 at 04:49:51PM +0000, Ian Jackson wrote:
Linux stable branches, and Linux upstream tip, are badly broken and
have been for months.  Apparently no-one is able to (or has time to)
to investigate and fix.

   linux-4.4          218 days         to be suspended
   linux-4.9          134 days         to be suspended
   linux-4.14         134 days         to be suspended
   linux-4.19         134 days         to be suspended
   linux-5.4           55 days
   linux-arm-xen     up to date
   linux-linus        372 days         to be suspended

These are times since the last push - ie, how long it has been broken.
Evidently no-one is paying any attention to this.[1]  I looked at the
reports myself and:

Nested HVM is broken on Intel in all of the 4.x branches.

FWIW, it's the Debian installer kernel the one that crashes AFAICT,
all the failures are:

[    0.000000] Linux version 4.9.0-6-amd64 (debian-kernel@xxxxxxxxxxxxxxxx) 
(gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 
4.9.82-1+deb9u3 (2018-03-02)
[...]
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, 
max_idle_ns: 30580167144 ns
[    0.000000] tsc: Fast TSC calibration failed
[    0.000000] tsc: Unable to calibrate against PIT
[    0.000000] tsc: HPET/PMTIMER calibration failed
[    0.000000] divide error: 0000 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-6-amd64 #1 
Debian 4.9.82-1+deb9u3
[    0.000000] Hardware name: Xen HVM domU, BIOS 4.14-unstable 03/11/2020
[    0.000000] task: ffffffffab611500 task.stack: ffffffffab600000
[    0.000000] RIP: 0010:[<ffffffffaaa59e1f>]  [<ffffffffaaa59e1f>] 
pvclock_tsc_khz+0xf/0x30

Seeing this and ...

[    0.000000] RSP: 0000:ffffffffab603f38  EFLAGS: 00010246
[    0.000000] RAX: 000f424000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffffab939020
[    0.000000] RBP: ffff93806e8f1540 R08: 000000003a637374 R09: 6f6974617262696c
[    0.000000] R10: 00000032f3af6dcd R11: 4d502f5445504820 R12: ffffffffab7dc920
[    0.000000] R13: ffffffffab7e82e0 R14: 00000000000146f0 R15: 000000000000008e
[    0.000000] FS:  0000000000000000(0000) GS:ffff93806e600000(0000) 
knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffff938065f3a000 CR3: 0000000025c08000 CR4: 00000000000406b0
[    0.000000] Stack:
[    0.000000]  ffffffffab74b1b6 ffff93806e8f1540 ffffffffab7dc920 
ba81e537ba81e512
[    0.000000]  ffffffffffffffff ffff93806e8f1540 ffffffffab73deb6 
ffffffffab7e82e0
[    0.000000]  0000000000000000 0000000000000020 0000ffffffffab73 
00000000ffffffff
[    0.000000] Call Trace:
[    0.000000]  [<ffffffffab74b1b6>] ? tsc_init+0x39/0x25b

... this and looking at xen_tsc_khz(), isn't it supposed to use
per_cpu(xen_vcpu, 0) instead, in case vCPU info got relocated?
(Code looks to be the same in 4.9 and 5.5. I'd also question
the hard-coded zero in there, but that's a different topic.)

It should use per_cpu(xen_vcpu, 0), but OTOH it shouldn't matter that
much if it doesn't, as the time information from the shared info page
wouldn't go away.

Seeing a zero divisor here indicates that HYPERVISOR_shared_info might
still point to the dummy shared info structure.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.