[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression between Xen 4.6.0 and 4.7.0, Direct kernel boot on a qemu-xen and seabios HVM guest doesn't work anymore.



On 2016-09-05 12:25, Jan Beulich wrote:
On 05.09.16 at 12:02, <linux@xxxxxxxxxxxxxx> wrote:
On 2016-09-05 11:46, Jan Beulich wrote:
On 05.09.16 at 11:20, <linux@xxxxxxxxxxxxxx> wrote:
Hmm it seems my thread was kind of hijacked and i was dropped from the
CC.

I had some time and bisected the issue and it resulted in:

5a3ce8f85e7e7bdd339d259daa19f6bc5cb4735f is the first bad commit
commit 5a3ce8f85e7e7bdd339d259daa19f6bc5cb4735f
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Wed Oct 21 10:56:31 2015 +0200

     x86/shadow: drop stray name tags from
sh_{guest_get,map}_eff_l1e()

Hmm, as Wei already indicated - that's rather odd. The commit isn't
really supposed to have any effect on functionality (and going
through it again I also can't spot any now). And are you indeed
using shadow mode, and if so does your problem not occur when
you use HAP instead?

In any event, if there was some hidden (and unintended) change
in functionality here, then the most likely result would seem to be
a crash, yet from the log fragment you posted it doesn't look like
there's _any_ relevant hypervisor output.

Hmm i was already afraid of that.
Attached is the output of xl dmesg, HAP is supported and should be
enabled by default (and i didn't disable it explicitly in my guest.cfg).

I just tried the opposite and specified hap=0 in my guest.cfg and this
case leads to 2 lines of additional output:

XEN) [2016-09-05 09:58:22.201] sh error: sh_remove_all_mappings(): can't
find all mappings of mfn 471b69: c=8000000000000003 t=7400000000000001
(XEN) [2016-09-05 09:58:22.201] sh error: sh_remove_all_mappings():
can't find all mappings of mfn 471b68: c=8000000000000003
t=7400000000000001

And these two messages are relevant here? I.e. do they go away
when you use a commit ahead of the one your bisect spotted?

Just double checked with a build one commit ahead of the culprit the bisection reported and hap=0,
and those messages are there as well and the guest boots fine now.
So they don't seem to be relevant.

Anyway - with you quite clearly having used HAP before, I can't
see how this commit would matter for you at all. In case you want
to double check you could try with a hypervisor built without
shadow paging code (which we've been allowing for quite a
while).

I just tried that and without shadow paging code the guest boots fine, so that's
interesting.

Is it possible that the reproduction of the issue isn't 100% reliable?

Nope it seems 100% reliable.

I.e. did you verify with a couple of runs each that it really is this
commit, and not just some spurious effect? If it is, then from all I
know so far I'd suspect an effect from code / data arrangement
rather than the commit itself to be the actual culprit.

Well at least there is one other independent user running into the same issue,
so it doesn't seem specifically related to my machine or my builds.

It also happens when running all my guests (and this is the last to start) and with only this guest.

Which reminds
me of another possible way of double checking: If said commit
reverts reasonably cleanly at the tip of staging or master, maybe
you could try with just this change reverted, instead of with
everything subsequent to it reverted too?

Nope it tried that already and it didn't revert cleanly (and i didn't see how to correctly fix it up).

--
Sander

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.