Re: [Xen-devel] HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2

Am 19.11.15 um 11:38 schrieb Andrew Cooper:
On 19/11/15 10:24, Jan Beulich wrote:
On 19.11.15 at 00:17, <andrew.cooper3@xxxxxxxxxx> wrote:
The disassembly of do_IRQ now looks like a plausible function, but the
consistently faulting address has no plausible way of generating a
double fault.  I suspect therefore that something has caused memory
corruption in Xen .text section.
Dump of assembler code for function do_IRQ:
    0xffff82d080176577 <+0>:      push   %rbp
    0xffff82d080176578 <+1>:      mov    %rsp,%rbp
    0xffff82d08017657b <+4>:      push   %r15
    0xffff82d08017657d <+6>:      push   %r14
    0xffff82d08017657f <+8>:      push   %r13
    0xffff82d080176581 <+10>:     push   %r12
    0xffff82d080176583 <+12>:     push   %rbx
    0xffff82d080176584 <+13>:     lea    -0x1058(%rsp),%rsp
    0xffff82d08017658c <+21>:     orq    $0x0,(%rsp)
    0xffff82d080176591 <+26>:     lea    0x1020(%rsp),%rsp

The orq surely has potential for causing a double fault, if %rsp is
near the stack limit. The two LEAs look suspect, presumably a
result of some non-standard option passed to gcc. Removing that
option might already be a step forward.

Andrew, Jan - thanks again.
In terms of non-standard options passed to gcc I have tried to make sense of 
what flags are actually being used during the build process. I am not 
absolutely sure, but I think the options passed to gcc are as follows:

I do have system wide flags which are used for non-debug builds:
CFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"

for builds with debug symbols (using splitdebug) there are system wide 
overrides as follows:
CFLAGS="-march=native -O2 -pipe -ggdb"
LDFLAGS: I'd assume that this inherits its value from the system wide setting 

for xen (the hypervisor) the build system seems to do the following:
CFLAGS="" (i.e. unset CFLAGS)
to me this indicates that the rest stays untouched (i.e. either standard or 
debug flags)

for xen-tools (includes e.g. hvmloader) the build system appears to to the 
CFLAGS="" (i.e. unset CFLAGS)
CXXFLAGS="${CXXFLAGS} -fno-strict-overflow"
LDFLAGS="" (i.e. unset LDFLAGS)

So I think there's probably nothing really fancy in my options to gcc.

Actually yes - that is a huge quantity of stack usage.

(The actual behaviour looks very suspect - it appears to be completely

The #DF handler reports that %rsp in the exception frame is within
range.  Having said that,

(XEN) [    2.788209] rbp: ffff83080ca8ed78   rsp: ffff83080ca8dcf8
r8:  ffff83080ca9d558
(XEN) [    2.837474] Valid stack range:
ffff83080ca8e000-ffff83080ca90000, sp=ffff83080ca8dcf8,
(XEN) [    2.848969] No stack overflow detected. Skipping stack trace.

In this case, the stack pointer *is* out of range, and has hit the guard

This means:
1) There is some bug in the stack overflow detection in the #DF handler.
2) Whatever options Gentoo compiles Xen with is sufficient to overflow
the 8K hypervisor stack.
Thanks Atom2

