[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M



On Thu, Jun 13, 2013 at 6:22 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Thu, 2013-06-13 at 17:55 +0100, Stefano Stabellini wrote:
>
>> > > We could have a xenstore flag somewhere that enables the old behaviour
>> > > so that people can revert back to qemu-xen-traditional and make the pci
>> > > hole below 4G even bigger than 448MB, but I think that keeping the old
>> > > behaviour around is going to make the code more difficult to maintain.
>> >
>> > The downside of that is that things which worked with the old scheme may
>> > not work with the new one though. Early in a release cycle when we have
>> > time to discover what has broken then that might be OK, but is post rc4
>> > really the time to be risking it?
>>
>> Yes, you are right: there are some scenarios that would have worked
>> before that wouldn't work anymore with the new scheme.
>> Are they important enough to have a workaround, pretty difficult to
>> identify for a user?
>
> That question would be reasonable early in the development cycle. At rc4
> the question should be: do we think this problem is so critical that we
> want to risk breaking something else which currently works for people.
>
> Remember that we are invalidating whatever passthrough testing people
> have already done up to this point of the release.
>
> It is also worth noting that the things which this change ends up
> breaking may for all we know be equally difficult for a user to identify
> (they are after all approximately the same class of issue).
>
> The problem here is that the risk is difficult to evaluate, we just
> don't know what will break with this change, and we don't know therefore
> if the cure is worse than the disease. The conservative approach at this
> point in the release would be to not change anything, or to change the
> minimal possible number of things (which would preclude changes which
> impact qemu-trad IMHO).
>


> WRT pretty difficult to identify -- the root of this thread suggests the
> guest entered a reboot loop with "No bootable device", that sounds
> eminently release notable to me. I also not that it was changing the
> size of the PCI hole which caused the issue -- which does somewhat
> underscore the risks involved in this sort of change.

But that bug was a bug in the first attempt to fix the root problem.
The root problem shows up as qemu crashing at some point because it
tried to access invalid guest gpfn space; see
http://lists.xen.org/archives/html/xen-devel/2013-03/msg00559.html.

Stefano tried to fix it with the above patch, just changing the hole
to start at 0xe; but that was incomplete, as it didn't match with
hvmloader and seabios's view of the world.  That's what this bug
report is about.  This thread is an attempt to find a better fix.

So the root problem is that if we revert this patch, and someone
passes through a pci device using qemu-xen (the default) and the MMIO
hole is resized, at some point in the future qemu will randomly die.

If it's a choice between users experiencing, "My VM randomly crashes"
and experiencing, "I tried to pass through this device but the guest
OS doesn't see it", I'd rather choose the latter.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.