[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M
On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote: > On 13/06/13 16:16, Ian Campbell wrote: > > On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote: > >> On 13/06/13 14:44, Stefano Stabellini wrote: > >>> On Wed, 12 Jun 2013, George Dunlap wrote: > >>>> On 12/06/13 08:25, Jan Beulich wrote: > >>>>>>>> On 11.06.13 at 19:26, Stefano Stabellini > >>>>>>>> <stefano.stabellini@xxxxxxxxxxxxx> wrote: > >>>>>> I went through the code that maps the PCI MMIO regions in hvmloader > >>>>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already > >>>>>> maps the PCI region to high memory if the PCI bar is 64-bit and the > >>>>>> MMIO > >>>>>> region is larger than 512MB. > >>>>>> > >>>>>> Maybe we could just relax this condition and map the device memory to > >>>>>> high memory no matter the size of the MMIO region if the PCI bar is > >>>>>> 64-bit? > >>>>> I can only recommend not to: For one, guests not using PAE or > >>>>> PSE-36 can't map such space at all (and older OSes may not > >>>>> properly deal with 64-bit BARs at all). And then one would generally > >>>>> expect this allocation to be done top down (to minimize risk of > >>>>> running into RAM), and doing so is going to present further risks of > >>>>> incompatibilities with guest OSes (Linux for example learned only in > >>>>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in > >>>>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the > >>>>> PFN to pfn_pte(), the respective parameter of which is > >>>>> "unsigned long"). > >>>>> > >>>>> I think this ought to be done in an iterative process - if all MMIO > >>>>> regions together don't fit below 4G, the biggest one should be > >>>>> moved up beyond 4G first, followed by the next to biggest one > >>>>> etc. > >>>> First of all, the proposal to move the PCI BAR up to the 64-bit range is > >>>> a > >>>> temporary work-around. It should only be done if a device doesn't fit > >>>> in the > >>>> current MMIO range. > >>>> > >>>> We have three options here: > >>>> 1. Don't do anything > >>>> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they > >>>> don't > >>>> fit > >>>> 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks > >>>> is > >>>> memory). > >>>> 4. Add a mechanism to tell qemu that memory is being relocated. > >>>> > >>>> Number 4 is definitely the right answer long-term, but we just don't > >>>> have time > >>>> to do that before the 4.3 release. We're not sure yet if #3 is > >>>> possible; even > >>>> if it is, it may have unpredictable knock-on effects. > >>>> > >>>> Doing #2, it is true that many guests will be unable to access the device > >>>> because of 32-bit limitations. However, in #1, *no* guests will be able > >>>> to > >>>> access the device. At least in #2, *many* guests will be able to do so. > >>>> In > >>>> any case, apparently #2 is what KVM does, so having the limitation on > >>>> guests > >>>> is not without precedent. It's also likely to be a somewhat tested > >>>> configuration (unlike #3, for example). > >>> I would avoid #3, because I don't think is a good idea to rely on that > >>> behaviour. > >>> I would also avoid #4, because having seen QEMU's code, it's wouldn't be > >>> easy and certainly not doable in time for 4.3. > >>> > >>> So we are left to play with the PCI MMIO region size and location in > >>> hvmloader. > >>> > >>> I agree with Jan that we shouldn't relocate unconditionally all the > >>> devices to the region above 4G. I meant to say that we should relocate > >>> only the ones that don't fit. And we shouldn't try to dynamically > >>> increase the PCI hole below 4G because clearly that doesn't work. > >>> However we could still increase the size of the PCI hole below 4G by > >>> default from start at 0xf0000000 to starting at 0xe0000000. > >>> Why do we know that is safe? Because in the current configuration > >>> hvmloader *already* increases the PCI hole size by decreasing the start > >>> address every time a device doesn't fit. > >>> So it's already common for hvmloader to set pci_mem_start to > >>> 0xe0000000, you just need to assign a device with a PCI hole size big > >>> enough. > > Isn't this the exact case which is broken? And therefore not known safe > > at all? > > > >>> > >>> My proposed solution is: > >>> > >>> - set 0xe0000000 as the default PCI hole start for everybody, including > >>> qemu-xen-traditional > > What is the impact on existing qemu-trad guests? > > > > It does mean that guest which were installed with a bit less than 4GB > > RAM may now find a little bit of RAM moves above 4GB to make room for > > the bigger whole. If they can dynamically enable PAE that might be ok. > > > > Does this have any impact on Windows activation? > > > >>> - move above 4G everything that doesn't fit and support 64-bit bars > >>> - print an error if the device doesn't fit and doesn't support 64-bit > >>> bars > >> Also, as I understand it, at the moment: > >> 1. Some operating systems (32-bit XP) won't be able to use relocated > >> devices > >> 2. Some devices (without 64-bit BARs) can't be relocated > >> 3. qemu-traditional is fine with a resized <4GiB MMIO hole. > >> > >> So if we have #1 or #2, at the moment an option for a work-around is to > >> use qemu-traditional. > >> > >> However, if we add your "print an error if the device doesn't fit", then > >> this option will go away -- this will be a regression in functionality > >> from 4.2. > > Only if print an error also involves aborting. It could print an error > > (lets call it a warning) and continue, which would leave the workaround > > viable.\ > > No, because if hvmloader doesn't increase the size of the MMIO hole, > then the device won't actually work. The guest will boot, but the OS > will not be able to use it. I meant continue as in increasing the hole too, although rereading the thread maybe that's not what everyone else was talking about ;-) > > -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |