[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Regression between Xen 4.6.0 and 4.7.0, Direct kernel boot on a qemu-xen and seabios HVM guest doesn't work anymore.
On Tue, Oct 25, 2016 at 07:25:06PM +0200, Sander Eikelenboom wrote: > On 2016-10-25 16:49, Wei Liu wrote: > >On Tue, Oct 25, 2016 at 01:37:45PM +0200, Sander Eikelenboom wrote: > >> > >>Tuesday, October 25, 2016, 1:24:12 PM, you wrote: > >> > >>> On Tue, Oct 18, 2016 at 01:48:23PM +0100, Wei Liu wrote: > >>>> On Mon, Oct 17, 2016 at 05:28:17PM +0200, Sander Eikelenboom wrote: > >>>> > Thursday, October 13, 2016, 4:43:31 PM, you wrote: > >>>> > > >>>> > > Hi Jan / Wei, > >>>> > > >>>> > > Took a while before i had the chance to fiddle some more to find the > >>>> > > actual culprit. > >>>> > > After analyzing the output of xl -vvvvv create somewhat more i came > >>>> > > to the > >>>> > > insight it was probably Qemu and not Xen causing the fault. > >>>> > > >>>> > > As a test I just used a qemu-xen binary build with xen-4.6.0 booting > >>>> > > up a guest with > >>>> > > direct kernel boot mode on xen-unstable. And that old qemu binary > >>>> > > works fine. > >>>> > > >>>> > > After testing i can conclude, Jan was right, the bisection was a red > >>>> > > herring, > >>>> > > the problem is caused by some change in Qemu and not by something in > >>>> > > the Xen tree. > >>>> > > (strange thing is that for as far as i know i did a "make distclean" > >>>> > > between > >>>> > > every build (taking a lot of time), which should have pulled a fresh > >>>> > > qemu-xen > >>>> > > tree and therefor the bisection should have lead to a commit with a > >>>> > > Config.mk > >>>> > > hash change for qemu-xen version.) > >>>> > > >>>> > > Will see if i can find some more time and bisect qemu and find the > >>>> > > culprit. > >>>> > > >>>> > > -- > >>>> > > Sander > >>>> > > >>>> > > >>>> > Unfortunately i have to give up on this issue, for me it's impossible > >>>> > to bisect this > >>>> > issue with my present git-foo. > >>>> > > >>>> > The first try with bisection of the whole xen-tree seems to have hit > >>>> > the issue that the > >>>> > qemu-revision that gets pulled on a fresh build is "master" during the > >>>> > whole > >>>> > dev period. That creates havoc when trying to bisect, since you are > >>>> > testing > >>>> > combinations that were never developed (nor auto tested) in that > >>>> > combination > >>>> > (especially when a xen-tree and qemu-tree change have a dependency > >>>> > like Roger's > >>>> > "xen: fix usage of xc_domain_create in domain builder") > >>>> > > >>>> > While trying to bisect only qemu (keeping xen itself on RELEASE-4.6.0 > >>>> > and > >>>> > seabios on rel-1.8.2) it get stuck on issues with that tree. > >>>> > Between 4.6.0 and 4.7.0 the qemu tree switched from > >>>> > git://xenbits.xen.org/qemu-upstream-4.6-testing.git > >>>> > to git://xenbits.xen.org/qemu-xen.git),after that there seem to have > >>>> > been a lot of merges going back and forth and to me it seems a mess > >>>> > (but as i > >>>> > said it could also be a lack of git-foo). I tried by manual bisecting, > >>>> > removing > >>>> > and cloning trees again etc. but that doesn't suffice, it's all going > >>>> > no-where. > >>>> > (while the known good build (plain RELEASE-4.6.0) always works, so it > >>>> > doesn't > >>>> > seem to be some random problem) > >>>> > > >>>> > >>>> Thanks for trying. > >>>> > >>>> > So perhaps some dev can at least verify that the issue is there (since > >>>> > 4.7.0) > >>>> > and put it on the "known broken" list of things. > >>>> > > >>>> > >>>> I will put this into the list of things I need to look at. > >>>> > >> > >>> I investigated this a bit. The root cause is the memory accounting is > >>> wrong in QEMU. It would try to allocate more ram than allowed. I haven't > >>> tried to figure out exactly what is wrong, though. > >> > >>That confirms what i was thinking in the end, but bisection the > >>qemu-tree > >>changes between the xen-4.6.0 and xen-4.7.0 release proved to be pretty > >>difficult as i explained. So i you have a hunch as to in what code it > >>should > >>reside debugging instead of bisecting would probably be better. > >>(so one of the questions is what changes in the memory accounting when > >>you > >>supply the kernel from the host instead of the guest, since booting a > >>kernel > >>with grub from within the guest doesn't give any memory accounting > >>issues.) > >> > >>Thanks for investigating ! > > > >I think I hunted down the offending function. > > > >Mind trying this patch for me? > > Hi Wei, > > This seems to help :) > > With a linux 4.8 kernel the HVM guest now boots fine with direct kernel boot > ! > > But there seems to be a gotcha which i think is not in the Xen docs/wiki: > when trying a linux 4.3 kernel the guest still didn't boot and i got a: > "qemu: linux kernel too old to load a ram disk" in the qemu log. > I don't know what qemu regards as "old" in this case. > QEMU checks for a signature / version in kernel header or whatnot. I can't tell why that specific number is chosen, though. > Another considiration: would it be worthwhile to add an OSStest for direct > kernel boot ? > (under the assumption that the host kernel that gets build can also boot on > HVM guest it's probably a very cheap test not requiring any additional > builds.) Yes, definitely. The more tests, the merrier. Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |