[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression between Xen 4.6.0 and 4.7.0, Direct kernel boot on a qemu-xen and seabios HVM guest doesn't work anymore.



On Tue, Oct 25, 2016 at 03:49:59PM +0100, Wei Liu wrote:
> On Tue, Oct 25, 2016 at 01:37:45PM +0200, Sander Eikelenboom wrote:
> > 
> > Tuesday, October 25, 2016, 1:24:12 PM, you wrote:
> > 
> > > On Tue, Oct 18, 2016 at 01:48:23PM +0100, Wei Liu wrote:
> > >> On Mon, Oct 17, 2016 at 05:28:17PM +0200, Sander Eikelenboom wrote:
> > >> > Thursday, October 13, 2016, 4:43:31 PM, you wrote:
> > >> > 
> > >> > > Hi Jan / Wei,
> > >> > 
> > >> > > Took a while before i had the chance to fiddle some more to find the 
> > >> > > actual culprit.
> > >> > > After analyzing the output of xl -vvvvv create somewhat more i came 
> > >> > > to the 
> > >> > > insight it was probably Qemu and not Xen causing the fault.
> > >> > 
> > >> > > As a test I just used a qemu-xen binary build with xen-4.6.0 booting 
> > >> > > up a guest with
> > >> > > direct kernel boot mode on xen-unstable. And that old qemu binary 
> > >> > > works fine.
> > >> > 
> > >> > > After testing i can conclude, Jan was right, the bisection was a red 
> > >> > > herring,
> > >> > > the problem is caused by some change in Qemu and not by something in 
> > >> > > the Xen tree.
> > >> > > (strange thing is that for as far as i know i did a "make distclean" 
> > >> > > between 
> > >> > > every build (taking a lot of time), which should have pulled a fresh 
> > >> > > qemu-xen 
> > >> > > tree and therefor the bisection should have lead to a commit with a 
> > >> > > Config.mk 
> > >> > > hash change for qemu-xen version.)
> > >> > 
> > >> > > Will see if i can find some more time and bisect qemu and find the 
> > >> > > culprit.
> > >> > 
> > >> > > --
> > >> > > Sander
> > >> > 
> > >> > 
> > >> > Unfortunately i have to give up on this issue, for me it's impossible 
> > >> > to bisect this 
> > >> > issue with my present git-foo.
> > >> > 
> > >> > The first try with bisection of the whole xen-tree seems to have hit 
> > >> > the issue that the 
> > >> > qemu-revision that gets pulled on a fresh build is "master" during the 
> > >> > whole
> > >> > dev period. That creates havoc when trying to bisect, since you are 
> > >> > testing 
> > >> > combinations that were never developed (nor auto tested) in that 
> > >> > combination
> > >> > (especially when a xen-tree and qemu-tree change have a dependency 
> > >> > like Roger's 
> > >> > "xen: fix usage of xc_domain_create in domain builder")
> > >> > 
> > >> > While trying to bisect only qemu (keeping xen itself on RELEASE-4.6.0 
> > >> > and 
> > >> > seabios on rel-1.8.2) it get stuck on issues with that tree.
> > >> > Between 4.6.0 and 4.7.0 the qemu tree switched from 
> > >> > git://xenbits.xen.org/qemu-upstream-4.6-testing.git
> > >> > to git://xenbits.xen.org/qemu-xen.git),after that there seem to have 
> > >> > been a lot of merges going back and forth and to me it seems a mess 
> > >> > (but as i 
> > >> > said it could also be a lack of git-foo). I tried by manual bisecting, 
> > >> > removing 
> > >> > and cloning trees again etc. but that doesn't suffice, it's all going 
> > >> > no-where.
> > >> > (while the known good build (plain RELEASE-4.6.0) always works, so it 
> > >> > doesn't 
> > >> > seem to be some random problem)
> > >> > 
> > >> 
> > >> Thanks for trying.
> > >> 
> > >> > So perhaps some dev can at least verify that the issue is there (since 
> > >> > 4.7.0)
> > >> > and put it on the "known broken" list of things.
> > >> > 
> > >> 
> > >> I will put this into the list of things I need to look at.
> > >> 
> > 
> > > I investigated this a bit. The root cause is the memory accounting is
> > > wrong in QEMU. It would try to allocate more ram than allowed. I haven't
> > > tried to figure out exactly what is wrong, though.
> > 
> > That confirms what i was thinking in the end, but bisection the qemu-tree 
> > changes between the xen-4.6.0 and xen-4.7.0 release proved to be pretty 
> > difficult as i explained. So i you have a hunch as to in what code it 
> > should 
> > reside debugging instead of bisecting would probably be better.
> > (so one of the questions is what changes in the memory accounting when you
> > supply the kernel from the host instead of the guest, since booting a kernel
> > with grub from within the guest doesn't give any memory accounting issues.) 
> > 
> > Thanks for investigating !
> 
> I think I hunted down the offending function.
> 
> Mind trying this patch for me?
> 
> ---8<---
> From 3c7f8b55109959cf470deeee452f452f7c0ade51 Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@xxxxxxxxxx>
> Date: Tue, 25 Oct 2016 15:45:04 +0100
> Subject: [PATCH] acpi: don't build acpi tables for xen guests
> 
> Xen's toolstack is in charge of building ACPI tables. Skip acpi table
> building if running on Xen.
> 
> This issue is discovered due to direct kernel boot on Xen doesn't boot
> anymore, because the new ACPI tables cause the guest to exceed its
> memory allocation limit.
> 
> Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
> ---
> Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> 
> RFC because I'm not sure this is the best way to fix it.
> ---
>  hw/i386/acpi-build.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a26a4bb..6ba5031 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -45,6 +45,7 @@
>  #include "sysemu/tpm_backend.h"
>  #include "hw/timer/mc146818rtc_regs.h"
>  #include "sysemu/numa.h"
> +#include "hw/xen/xen.h"
>  
>  /* Supported chipsets: */
>  #include "hw/acpi/piix4.h"
> @@ -2865,6 +2866,12 @@ void acpi_setup(void)
>          return;
>      }
>  
> +    if (xen_enabled()) {
> +        fprintf(stderr, "%s %d\n", __FILE__, __LINE__);

Oops, this is just debug output - but you get the idea.

> +        ACPI_BUILD_DPRINTF("Xen enabled. Bailing out.\n");
> +        return;
> +    }
> +
>      build_state = g_malloc0(sizeof *build_state);
>  
>      acpi_set_pci_info();
> -- 
> 2.1.4
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.