[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)



On Fri, Feb 08, 2019 at 01:14:16PM +0000, Wei Liu wrote:
> On Fri, Feb 08, 2019 at 05:21:44AM +0000, osstest service owner wrote:
> > flight 133030 xen-unstable-smoke real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/133030/
> > 
> > Failures and problems with tests :-(

Thanks for looking at this.

> After some investigation, I think something is wrong with the linux-4.9
> branch.
> 
> The issue to hand is:
> 
>     Feb  8 04:12:54.790904 
>     Loading initial ramdisk ...
>     Feb  8 04:12:55.114864 
>     EFI stub: Booting Linux Kernel...
>     Feb  8 04:12:55.354885 EFI stub: ERROR: Failed to alloc kernel memory
>     Feb  8 04:12:55.354946 EFI stub: ERROR: Failed to relocate kernel
>     Feb  8 04:12:55.354993 Feb  8 04:12:55.355016 
>       Failed to boot both default and fallback entries.
> 
> The new 4.9 kernel can't be loaded _natively_ anymore.

This is not using our own-built kernel.

It is using (our copy of) the jessie arm64 debian-installer kernel
which has not changed since June.  I guess it is just about possible
that the file in /home/tftp has been corrupted somehow.  I have asked
Credativ to check them against our backups.

Looking at
  http://logs.test-lab.xenproject.org/osstest/results/host/laxton0.html
  http://logs.test-lab.xenproject.org/osstest/results/host/laxton1.html
it is evident that both boxes started failing at roughly the same
time.

On both boxes the first failing job was a test in flight 132973, the
linux-4.9 one.  But I think this is a red herring because (i) the
failure occurs during host installation, before the flight has
actually touched the kernel to be tested (ii) on laxton1 there were
two test jobs in 132973 which passed (iii) on both machines there were
earlier build jobs in 132973 which passed (which would have had a very
similar host installation step).

I have checked and there was no update to the osstest code.  I don't
think there have been updates to the infrastructure config but I
haven't searched all the infrastructure boxes etckeepers etc.

So something has caused both machines to fail simultaneously.

Possible explanations to me seem to be:

  * Some kind of common physical cause (power surge corrupting the
    firmware or something)

  * Some kind of common nonphysical cause external to the hosts or the
    tests: bad files on the infrastructure hosts; a change to the
    behaviour of the bootp/tftp servers; a new kind of broadcast
    network packet (perhaps from other tests) which causes the laxton
    firmware to malfunction; etc.

  * The new linux-4.9 kernel does something which has the effect of
    often (but not always) corrupting the laxtons' firmware.

  * A firmware bug triggered by the passage of time (eg
    clock-dependent)

  * Misunderstanding by me in my analysis of what ingredients are used
    by the host installation failures.

  * Some other common-mode failure that I haven't thought of.

In the meantime I have unblessed the laxtons to avoid osstest
repeatedly power cycling them to try to get them to work.  FTR this
will still not allow the push gate to pass.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.