[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2

Am 19.11.15 um 00:17 schrieb Andrew Cooper:
On 18/11/2015 22:51, Atom2 wrote:
Am 17.11.15 um 00:10 schrieb Atom2:
[big snip]
Hi Andrew,
as promised I have again tried with a debug build and the results are
very mixed. I initially tried to better understand what the debug USE
flag actually does in gentoo and my understanding (after reading the
so called ebuilds) is now that the XEN hypervisor will be built by
adding a gcc option of "debug=y" - and that's what should compile a
debug build - right?
Yes indeed.

So I went on and again enabled the debug USE flag plus gdb symbols and
rebuilt the hypervisor in the hope that this created a valid and
working debug build.

It, however, seems there's another problem lurking somewhere which
only manifests itself when I boot from the debug build of the hypervisor.
You did manage to get at least one decent log from a properly debugbuild.

However, all we need is the hvm_debug output.  This patch:

diff --git a/xen/include/asm-x86/hvm/support.h
index 05ef5c5..7a8fbb5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -28,7 +28,7 @@


-#ifndef NDEBUG
+#if 1
  #define DBG_LEVEL_0                 (1 << 0)
  #define DBG_LEVEL_1                 (1 << 1)
  #define DBG_LEVEL_2                 (1 << 2)

Will enable hvm_debug in a non-debug build of hypervisor.  Can you try
that please?
Done. Please find the xl dmesg output attached to this mail. I guess this time it is really what you were expecting. Whether it does make sense though, might be a different issue. But I am confident in your abilities to figure out what's going on.
The system crashes early on with a DOUBLE FAULT in doIRQ - we have had
this already earlier in that thread. I am however a step further as
the disass in gdb now seems to provide not just an empty page full of
NULL values but rather something that might give you a hint why it
crashes that early on: Please see the attached disass file (doIRQ)
together with the serial console output (serial.dbg). The old NULL
value file was probably because I did not include gdb symbols in the
debug build at that time - my bad.
The fact that it is completely consistent is useful from a debugging
point of view.

The disassembly of do_IRQ now looks like a plausible function, but the
consistently faulting address has no plausible way of generating a
double fault.  I suspect therefore that something has caused memory
corruption in Xen .text section.

As an experiment, could you try booting with the minimum available
command line options, which look to be just "com1=115200,8n1,0x3f8,4
console=com1,vga dom0_mem=4G,max:4G" to see whether it is an interaction
of the options you have enabled.
I haven't done that yet as this will again require a re-compile of the non-working debug build. I'll probably give that a try tomorrow.
If the issue still reproduces, I will rework the previous debugging
patch I gave you to definitely dump the actual code being run at the
time of the fault.

Thanks Atom2

Attachment: xl.dmesg.dbg
Description: Text document

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.