[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.12.0-rc Hangs Around masked ExtINT on CPU#




On 5/28/2019 8:02 AM, John L. Poole wrote:


On 5/28/2019 12:41 AM, Roger Pau Monné wrote:
On Mon, May 27, 2019 at 03:35:21PM -0700, John L. Poole wrote:
On 5/27/2019 9:18 AM, Roger Pau Monné wrote:
On Mon, Apr 29, 2019 at 05:27:34PM +0200, Roger Pau Monné wrote:
IMO it would be better if you can build directly from the upstream git
repository [0], that way you could use git-bisect(1) in order to figure
out which commit broke your system. For example:

# git clone git://xenbits.xen.org/xen.git
# cd xen
# git checkout RELEASE-4.7.0
# make xen -j8

That should give you a set of Xen binaries in the xen/ directory, IIRC
you are booting from EFI so you likely need xen/xen.efi.

If that works, then you can test RELEASE-4.8.0 and if that fails to
boot you should have a range of commits that you can bisect in order
to find the culprit.
FWIW, I've been unable to find a box with the same CPU model (C2750)
that you are using. I've found a couple of old Atom boxes using
different CPUs but they all seem to boot fine using latest
xen-unstable. I've looked on eBay for that CPU but everything
containing it is server-grade and >200$ which I'm sadly not going to
pay.

Unless you are able to bisect the tree and give us the bad commit
that's causing your issues I'm afraid at least myself I won't be able
to progress this any further, sorry.

Roger.
I attempted to work backwards and ran into a nightmare with Gentoo.   I kept
getting compiler errors which I suspect was a result of having a newer
version
of GCC and other things.  It's not an easy thing to travel
back in time in Gentoo because everything keeps getting upgraded.  I just
cannot make the time now to unravel this as I have some demands on my time
and will be engaged for the next four to six weeks.
IMO your best bet is to build Xen using Debian stretch, that's used by
the Xen test system, and is likely to be able to build the different
Xen versions, stable-* branches tested by osstest should build on
stretch.

What I've done in the past if that also triggers compiler errors is to
build a chroot with an older version of Debian and then build Xen
inside of it. You can do this in a box different from the one you are
testing, ie: you could create a Debian VM and build Xen from there.

Note that in order to bisect this issue you only need to build the Xen
kernel (make xen, no need to run ./configure), there's no need to
build the tools, hence you need almost no dependencies installed on
the builder.

I've done a build of the stable-4.7 branch myself and uploaded the
hypervisor binaries to:

http://xenbits.xen.org/people/royger/stable-4.7/

Could you give those a try (I wasn't sure whether you need xen.gz or
xen.efi so I've uploaded both) and see if you still have issues
booting?

Testing those binaries should be as simple as placing them in /boot/
and fixing your bootloader configuration to boot from those. Please
send the serial log when booting from the provided binaries.

How much would it cost for you to obtain the machine you need? I may
consider paying for it. I bought this Atom server just to economically run
Xen so the machine has marginal value to me if I cannot run Xen on it.
Even if we go that route, there's no guarantee that I would be able to
fix the issue, and there's also the possibility that the hardware you
have is somehow broken, and that the new one won't exhibit this issue.

Roger.
Roger,

You have given me an idea.  I have several VMs on my hard disk that are not backed up.  So, I think what I'll do is remove the current hard disk and place
a fresh hard disk in and then try to install a Debian based Xen anew so I
do not risk altering my Gentoo-based hard disk.  This approach should free
me from the entanglement of a bleeding edge distribution, e.g. Gentoo.

I was looking back at my notes.  I acquired this Atom-based server in November of 2016 and installed the Debian Xen to test and it worked.  So I then installed Gentoo and ran into problems with GRUB.  I learned that GRUB was not yet ready to support EFI and Xen, so I used the manual method to drop into an EFI shell and launch my DOM0 instance.  I later tried to upgrade the kernel and ran into problems and aborted an upgrade, I just kept what I had working since I had
already created some Gentoo-based VMs.  During my build process, I had
run into an issue "coff-x86-64 pe-x86-64" which Jan Beulich had assisted on and
determined was something worth of the attention of the "binutils folks."

I'll attempt the hard disk swap in a few days after I receive a shipment of the new disk.

Thank you,
John


Update (9/25/2019).

Short version: Windows wireless USB keyboard hardware incompatibility caused the problem.

The Take-away: a USB keyboard can affect the boot for the xen kernel

This was a hardware caused problem.

Long version:

I had several critical matters that I could not postpone so my work
on this was suspended since May.  I finally had time to resume work on this problem.  Recall, I could successfully boot a Gentoo kernel, but when I tried a Xen kernel, the system would
hand early on at the masking of the CPUs.

By chance, I decided to swap out the USB keyboard "Microsoft Wireless Desktop Receiver 3.1" model: 1028, because I had to keep replacing batteries and the range was very limited, e.g. 15", and characters were dropping out.  I replaced it with a generic Amazon USB keyboard. Suddenly the boot problems went away: no more hanging at the CPU masking point.

I sailed throught and successfully booted.  Moreover, I had placed in a new hard disk
in the server, disengaged the exsting one, and installed the
Debian version, 8.6.0 of 11/8/2016, I first used to test this server so I had an apples-to-apples test case before I returned this for service under warranty, and the installation while occurring, had video artifacts the prohibited the graphic install and dropped me into a console install with colorations that caused invisible selections. After I installed the Debian 8.6.0, I had the
same problem -- I could not get past the "masked ExtINT on CPU#..."

Since this discovery several days ago, I have booted my various xen kernels (in EFI) and have not encountered any of the problems I previously suffered. While I do have some other issues that relate to Gentoo specific tweaks, I am not concerned and I wanted to close this issue by reporting this discovery.  Of course, I can make available the USB unit to qualified persons if they want to test or I can affix it to the server to test a debugging version.

Thank you Roger and Jan and others for all your help.

Related bug: https://bugs.gentoo.org/679826


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.