[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

To: Jan Beulich <JBeulich@xxxxxxxx>
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Wed, 15 Jul 2015 16:19:54 +0100
Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Tiejun Chen <tiejun.chen@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Wed, 15 Jul 2015 15:20:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, Jul 15, 2015 at 3:00 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 15.07.15 at 15:40, <dunlapg@xxxxxxxxx> wrote:
>> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> Therefore I'll not make any further comments on the rest of the
>>> patch, but instead outline an allocation model that I think would
>>> fit our needs: Subject to the constraints mentioned above, set up
>>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>>> bits], i.e. reasonably small a memory block). Each bit represents a
>>> page usable for MMIO: First of all you remove the range from
>>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>>> first pass over all devices, allocating (in the bitmap) space for only
>>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>>> from top down. For example, if you have available
>>>
>>> [f0000000,f8000000)
>>> [f9000000,f9001000)
>>> [fa000000,fa003000)
>>> [fa010000,fa012000)
>>>
>>> and you're looking for a single page slot, you should end up
>>> picking fa002000.
>>>
>>> After this pass you should be able to do RAM relocation in a
>>> single attempt just like we do today (you may still grow the MMIO
>>> window if you know you need to and can fit some of the 64-bit
>>> BARs in there, subject to said constraints; this is in an attempt
>>> to help OSes not comfortable with 64-bit resources).
>>>
>>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>>> them below 4G (you still have the bitmap left of what you've got
>>> available), put them there. Allocation strategy could be the same
>>> as above (biggest first), perhaps allowing for some factoring out
>>> of logic, but here smallest first probably could work equally well.
>>> The main thought to decide between the two is whether it is
>>> better to fit as many (small) or as big (in total) as possible a set
>>> under 4G. I'd generally expect the former (as many as possible,
>>> leaving only a few huge ones to go above 4G) to be the better
>>> approach, but that's more a gut feeling than based on hard data.
>>
>> I agree that it would be more sensible for hvmloader to make a "plan"
>> first, and then do the memory reallocation (if it's possible) at one
>> time, then go through and actually update the device BARs according to
>> the "plan".
>>
>> However, I don't really see how having a bitmap really helps in this
>> case.  I would think having a list of free ranges (perhaps aligned by
>> powers of two?), sorted small->large, makes the most sense.
>
> I view bitmap vs list as just two different representations, and I
> picked the bitmap approach as being more compact storage wise
> in case there are many regions to deal with. I'd be fine with a list
> approach too, provided lookup times don't become prohibitive.

Sure, you can obviously translate one into the other.  The main reason
I dislike the idea of a bitmap is having to write code to determine
where the next free region is, and how big that region is, rather than
just going down the next on the list and reading range.start and
range.len.

Also, in your suggestion each bit is a page (4k); so assuming a 64-bit
pointer, a 64-bit starting point, and a 64-bit length (juts to make
things simple), a single "range" takes up enough bitmap to reserve
(64+64+64)*4k = 768k.  So if we make the bitmap big enough for 2GiB,
then the break-even point for storage is 2,730 ranges.  It's even
higher if we have an array instead of a linked list.

I'm pretty sure that having such a large number of ranges will be
vanishingly rare;  I'd expect the number of ranges so the "range"
representation will not only be easier to code and read, but will in
the common case (I believe) be far more compact.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [v7][PATCH 00/16] Fix RMRR
  - From: Tiejun Chen
- [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  - From: Tiejun Chen
- Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  - From: Jan Beulich
- Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  - From: George Dunlap
- Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure
Next by Date: Re: [Xen-devel] [PATCH for-4.6] tools/hotplug: Add an initscript to start "xl devd" in a driver domain
Previous by thread: Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Next by thread: [Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.