[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M



On Thu, 2013-06-13 at 16:40 +0100, George Dunlap wrote:
> On 13/06/13 16:36, Ian Campbell wrote:
> > On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote:
> >> On 13/06/13 16:16, Ian Campbell wrote:
> >>> On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
> >>>> On 13/06/13 14:44, Stefano Stabellini wrote:
> >>>>> On Wed, 12 Jun 2013, George Dunlap wrote:
> >>>>>> On 12/06/13 08:25, Jan Beulich wrote:
> >>>>>>>>>> On 11.06.13 at 19:26, Stefano Stabellini
> >>>>>>>>>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>>>>>>> I went through the code that maps the PCI MMIO regions in hvmloader
> >>>>>>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it 
> >>>>>>>> already
> >>>>>>>> maps the PCI region to high memory if the PCI bar is 64-bit and the 
> >>>>>>>> MMIO
> >>>>>>>> region is larger than 512MB.
> >>>>>>>>
> >>>>>>>> Maybe we could just relax this condition and map the device memory to
> >>>>>>>> high memory no matter the size of the MMIO region if the PCI bar is
> >>>>>>>> 64-bit?
> >>>>>>> I can only recommend not to: For one, guests not using PAE or
> >>>>>>> PSE-36 can't map such space at all (and older OSes may not
> >>>>>>> properly deal with 64-bit BARs at all). And then one would generally
> >>>>>>> expect this allocation to be done top down (to minimize risk of
> >>>>>>> running into RAM), and doing so is going to present further risks of
> >>>>>>> incompatibilities with guest OSes (Linux for example learned only in
> >>>>>>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
> >>>>>>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
> >>>>>>> PFN to pfn_pte(), the respective parameter of which is
> >>>>>>> "unsigned long").
> >>>>>>>
> >>>>>>> I think this ought to be done in an iterative process - if all MMIO
> >>>>>>> regions together don't fit below 4G, the biggest one should be
> >>>>>>> moved up beyond 4G first, followed by the next to biggest one
> >>>>>>> etc.
> >>>>>> First of all, the proposal to move the PCI BAR up to the 64-bit range 
> >>>>>> is a
> >>>>>> temporary work-around.  It should only be done if a device doesn't fit 
> >>>>>> in the
> >>>>>> current MMIO range.
> >>>>>>
> >>>>>> We have three options here:
> >>>>>> 1. Don't do anything
> >>>>>> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they 
> >>>>>> don't
> >>>>>> fit
> >>>>>> 3. Convince qemu to allow MMIO regions to mask memory (or what it 
> >>>>>> thinks is
> >>>>>> memory).
> >>>>>> 4. Add a mechanism to tell qemu that memory is being relocated.
> >>>>>>
> >>>>>> Number 4 is definitely the right answer long-term, but we just don't 
> >>>>>> have time
> >>>>>> to do that before the 4.3 release.  We're not sure yet if #3 is 
> >>>>>> possible; even
> >>>>>> if it is, it may have unpredictable knock-on effects.
> >>>>>>
> >>>>>> Doing #2, it is true that many guests will be unable to access the 
> >>>>>> device
> >>>>>> because of 32-bit limitations.  However, in #1, *no* guests will be 
> >>>>>> able to
> >>>>>> access the device.  At least in #2, *many* guests will be able to do 
> >>>>>> so.  In
> >>>>>> any case, apparently #2 is what KVM does, so having the limitation on 
> >>>>>> guests
> >>>>>> is not without precedent.  It's also likely to be a somewhat tested
> >>>>>> configuration (unlike #3, for example).
> >>>>> I would avoid #3, because I don't think is a good idea to rely on that
> >>>>> behaviour.
> >>>>> I would also avoid #4, because having seen QEMU's code, it's wouldn't be
> >>>>> easy and certainly not doable in time for 4.3.
> >>>>>
> >>>>> So we are left to play with the PCI MMIO region size and location in
> >>>>> hvmloader.
> >>>>>
> >>>>> I agree with Jan that we shouldn't relocate unconditionally all the
> >>>>> devices to the region above 4G. I meant to say that we should relocate
> >>>>> only the ones that don't fit. And we shouldn't try to dynamically
> >>>>> increase the PCI hole below 4G because clearly that doesn't work.
> >>>>> However we could still increase the size of the PCI hole below 4G by
> >>>>> default from start at 0xf0000000 to starting at 0xe0000000.
> >>>>> Why do we know that is safe? Because in the current configuration
> >>>>> hvmloader *already* increases the PCI hole size by decreasing the start
> >>>>> address every time a device doesn't fit.
> >>>>> So it's already common for hvmloader to set pci_mem_start to
> >>>>> 0xe0000000, you just need to assign a device with a PCI hole size big
> >>>>> enough.
> >>> Isn't this the exact case which is broken? And therefore not known safe
> >>> at all?
> >>>
> >>>>> My proposed solution is:
> >>>>>
> >>>>> - set 0xe0000000 as the default PCI hole start for everybody, including
> >>>>> qemu-xen-traditional
> >>> What is the impact on existing qemu-trad guests?
> >>>
> >>> It does mean that guest which were installed with a bit less than 4GB
> >>> RAM may now find a little bit of RAM moves above 4GB to make room for
> >>> the bigger whole. If they can dynamically enable PAE that might be ok.
> >>>
> >>> Does this have any impact on Windows activation?
> >>>
> >>>>> - move above 4G everything that doesn't fit and support 64-bit bars
> >>>>> - print an error if the device doesn't fit and doesn't support 64-bit
> >>>>> bars
> >>>> Also, as I understand it, at the moment:
> >>>> 1. Some operating systems (32-bit XP) won't be able to use relocated 
> >>>> devices
> >>>> 2. Some devices (without 64-bit BARs) can't be relocated
> >>>> 3. qemu-traditional is fine with a resized <4GiB MMIO hole.
> >>>>
> >>>> So if we have #1 or #2, at the moment an option for a work-around is to
> >>>> use qemu-traditional.
> >>>>
> >>>> However, if we add your "print an error if the device doesn't fit", then
> >>>> this option will go away -- this will be a regression in functionality
> >>>> from 4.2.
> >>> Only if print an error also involves aborting. It could print an error
> >>> (lets call it a warning) and continue, which would leave the workaround
> >>> viable.\
> >> No, because if hvmloader doesn't increase the size of the MMIO hole,
> >> then the device won't actually work.  The guest will boot, but the OS
> >> will not be able to use it.
> > I meant continue as in increasing the hole too, although rereading the
> > thread maybe that's not what everyone else was talking about ;-)
> 
> Well if you continue increasing the hole, then it works on 
> qemu-traditional but on qemu-xen you have weird crashes and guest hangs 
> at some point in the future when qemu tries to map a non-existent guest 
> memory address -- that's much worse than the device just not being 
> visible to the OS.

I thought the point of the print was simply to give us something to spot
in the logs in this latter case.

> That's the point -- current behavior on qemu-xen causes weird hangs; but 
> the simple way of preventing those hangs (just not increasing the MMIO 
> hole size) removes functionality from both qemu-xen and 
> qemu-traditional, even though qemu-traditional doesn't have any problems 
> with the resized MMIO hole.
> 
> So there's no simple way to avoid random crashes while keeping the 
> work-around functional; that's why someone suggested adding a xenstore 
> key to tell hvmloader what to do.
> 
> At least, that's what I understood the situation to be -- someone 
> correct me if I'm wrong. :-)
> 
>   -George



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.