[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub



On 22/01/16 12:56, Vladimir 'Ï-coder/phcoder' Serbinenko wrote:
On 22.09.2015 10:53, Ian Campbell wrote:
Hi Vladimir & grub-devel,

Do you have any thoughts on this issue with i386 pv-grub2?

Is it still an issue? If so I'll try to replicate it. From stack dump I
see that it has jumped to NULL. GRUB has no threads so it's not a race
condition with itself but may be one with some Xen part. An altrnative
possibility is that grub forgets to flush cache at some point in boot
process.

Looks like GRUB doesn't have a traptable registered with Xen (the PV equivalent of the IDT).

First, Xen tried to inject a #GP fault and found that the entry EIP was at 0 (which is sadly the default if nothing is specified). It then took a pagefault while attempting to inject the #GP, and crashed the domain.

~Andrew

Thanks, Ian.

On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
applied) and Xen 4.4.1

I originally posted a bug report with Debian but got the suggestion to
file bugs with upstream as well.
Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480

Note that my original thought was that this bug probably is within GRUB.
But Ian asked me to file a bug with Xen as well, you have to live with
the
fact that it is centered around GRUB though.

Here's the information from my original bug report:

Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
fail when chainloading the domU's grub. 64-bit domU seem to work 100%
of the time.

My understanding of the process:

 * dom0 launches domU with grub that is loaded from dom0's disk.
 * Grub reads config file from memdisk, and then looks for grub binary in
    domU filesystem.
 * If grub is found in domU it then chainloads (multiboot) that grub
binary
    and the domU grub reads grub.cfg and continue booting.
 * If grub is not found in domU it reads grub.cfg and continues with
boot.

It fails at step 3 in my list of the boot process, but sometimes it
does work so it may be something like a race condition that causes the
problem?

A workaround is to not install or rename /boot/xen in domU so that the
first grub that is loaded from dom0's disk will not find the grub
binary in the domU filesystem and hence continues to read grub.cfg and
boot. The drawback of this is of course that the two versions can't
differ too much as there are different setups creating grub.cfg and
then reading/parsing it at boot time.

I am not sure at this point whether this is a problem in XEN or a
problem in grub but I compiled the legacy pvgrub that uses some minios
from XEN (don't really know much more about it) and when that legacy
pvgrub chainloads the domU grub it seems to work 100% of the time. Now
the legace pvgrub is not a real alternative as it's not packaged for
Debian though.

When it fails "xl create vm -c" outputs this:
Parsing config from /etc/xen/vm
libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
type for domid=16
Unable to attach console
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
child [0] exited with error status 1

And "xl dmesg" shows errors like this:
(XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
0x0000000000000000 to 0x000000000000ffff.
(XEN) d16:v0: unhandled page fault (ec=0010)
(XEN) Pagetable walk from 0000000000000000:
(XEN) L4[0x000] = 0000000200256027 000000000000049c
(XEN) L3[0x000] = 0000000200255027 000000000000049d
(XEN) L2[0x000] = 0000000200251023 00000000000004a1
(XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
compat_create_bounce_frame+0xc6/0xde
(XEN) Domain 16 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e019:[<0000000000000000>]
(XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
(XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
(XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
(XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
(XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
(XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
(XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
(XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
(XEN) Guest stack trace from esp=005a5ff0:
(XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
0016b388
(XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
0016b380
(XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
0016b378
(XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
0016b370
(XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
0016b368
(XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
0016b360
(XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
0016b358
(XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
0016b350
(XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
0016b348
(XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
0016b340
(XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
0016b338
(XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
0016b330
(XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
0016b328
(XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
0016b320
(XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
0016b318
(XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
0016b310
(XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
0016b308
(XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
0016b300
(XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
0016b2f8
(XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
0016b2f0

An easy way to find out which grub you are in if the machine boots is
to hit 'c' and type 'ls', only the grub from dom0 will know about
(memdisk). So when trying to replicate the issue (and the domU
actually starts) you can hit 'c', type 'ls' (check for memdisk) and
then type 'halt' and relaunch the domU. Usually I can't launch more
than 4-5 times in a row before it fails, often it fails on my first
try.

For information I have reproduced on two different AMD desktop
processor machines, not sure if Intel would be any different. I'm
pretty sure I did tests with grub from unstable with same result at
some point, but can test again if that is likely to work.

The package that is in installed on the domU side is "grub-xen".

I am unable to understand how to debug grub further on my own, I have
printed out text from grub so that I understood that it is the
chainload that fails. I see no output from the domU grub (except when
it works as it should of course). I can help with further testing if
needed.

/Andreas


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

      



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.