[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode"

On 11/28/2013 09:17 PM, Andrew Cooper wrote:
On 28/11/13 12:31, Andrew Cooper wrote:

I have recently positivly identified
b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.

Serial log is attached.  The crash is completely deterministic, and is
from an IBM xSeries 3530 M4 server.

Given the crash and bad patch, I suspect it is more to do with the
NUMA/memory layout than the specifics of the server.

Dario: Being your patch, do you have any ideas?

George: Regarding the release, if a fix cant easily be found, it might
be worth considering reverting the change.


Following some further debugging, this is rather more complicated than I
initially thought.

There is some form of memory corruption; depending on which exact
underlying changeset I base the XenServer patch queue on, or which pages
are present in the queue, I get crashes in different locations,
including faults from mis-aligned instructions including stack traces
which are completely bogus.

The saving grace is that the crashes appear to be completely
deterministic for a given binary.  (although this sever is slower than
treacle to boot)

Well, one thing that patch certainly *does* do is remove a very large chunk of zeroed bytes from the stack (doing the work directly in the domain struct rather than doing it on the stack and then copying it in); so it's possible you're got an uninitialized variable somewhere...


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.