[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)
On Fri, Feb 08, 2019 at 01:14:16PM +0000, Wei Liu wrote: > On Fri, Feb 08, 2019 at 05:21:44AM +0000, osstest service owner wrote: > > flight 133030 xen-unstable-smoke real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/133030/ > > > > Failures and problems with tests :-( Thanks for looking at this. > After some investigation, I think something is wrong with the linux-4.9 > branch. > > The issue to hand is: > > Feb 8 04:12:54.790904 > Loading initial ramdisk ... > Feb 8 04:12:55.114864 > EFI stub: Booting Linux Kernel... > Feb 8 04:12:55.354885 EFI stub: ERROR: Failed to alloc kernel memory > Feb 8 04:12:55.354946 EFI stub: ERROR: Failed to relocate kernel > Feb 8 04:12:55.354993 Feb 8 04:12:55.355016 > Failed to boot both default and fallback entries. > > The new 4.9 kernel can't be loaded _natively_ anymore. This is not using our own-built kernel. It is using (our copy of) the jessie arm64 debian-installer kernel which has not changed since June. I guess it is just about possible that the file in /home/tftp has been corrupted somehow. I have asked Credativ to check them against our backups. Looking at http://logs.test-lab.xenproject.org/osstest/results/host/laxton0.html http://logs.test-lab.xenproject.org/osstest/results/host/laxton1.html it is evident that both boxes started failing at roughly the same time. On both boxes the first failing job was a test in flight 132973, the linux-4.9 one. But I think this is a red herring because (i) the failure occurs during host installation, before the flight has actually touched the kernel to be tested (ii) on laxton1 there were two test jobs in 132973 which passed (iii) on both machines there were earlier build jobs in 132973 which passed (which would have had a very similar host installation step). I have checked and there was no update to the osstest code. I don't think there have been updates to the infrastructure config but I haven't searched all the infrastructure boxes etckeepers etc. So something has caused both machines to fail simultaneously. Possible explanations to me seem to be: * Some kind of common physical cause (power surge corrupting the firmware or something) * Some kind of common nonphysical cause external to the hosts or the tests: bad files on the infrastructure hosts; a change to the behaviour of the bootp/tftp servers; a new kind of broadcast network packet (perhaps from other tests) which causes the laxton firmware to malfunction; etc. * The new linux-4.9 kernel does something which has the effect of often (but not always) corrupting the laxtons' firmware. * A firmware bug triggered by the passage of time (eg clock-dependent) * Misunderstanding by me in my analysis of what ingredients are used by the host installation failures. * Some other common-mode failure that I haven't thought of. In the meantime I have unblessed the laxtons to avoid osstest repeatedly power cycling them to try to get them to work. FTR this will still not allow the push gate to pass. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |