[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



>>> On 22.04.16 at 04:36, <haozhong.zhang@xxxxxxxxx> wrote:
> On 04/21/16 01:04, Jan Beulich wrote:
>> >>> On 21.04.16 at 07:09, <haozhong.zhang@xxxxxxxxx> wrote:
>> > On 04/12/16 16:45, Haozhong Zhang wrote:
>> >> On 04/08/16 09:52, Jan Beulich wrote:
>> >> > >>> On 08.04.16 at 07:02, <haozhong.zhang@xxxxxxxxx> wrote:
>> >> > > On 03/29/16 04:49, Jan Beulich wrote:
>> >> > >> >>> On 29.03.16 at 12:10, <haozhong.zhang@xxxxxxxxx> wrote:
>> >> > >> > On 03/29/16 03:11, Jan Beulich wrote:
>> >> > >> >> >>> On 29.03.16 at 10:47, <haozhong.zhang@xxxxxxxxx> wrote:
>> >> > > [..]
>> >> > >> >> > I still cannot find a neat approach to manage guest permissions 
>> >> > >> >> > for
>> >> > >> >> > nvdimm pages. A possible one is to use a per-domain bitmap to 
>> >> > >> >> > track
>> >> > >> >> > permissions: each bit corresponding to an nvdimm page. The 
>> >> > >> >> > bitmap can
>> >> > >> >> > save lots of spaces and even be stored in the normal ram, but
>> >> > >> >> > operating it for a large nvdimm range, especially for a 
>> >> > >> >> > contiguous
>> >> > >> >> > one, is slower than rangeset.
>> >> > >> >> 
>> >> > >> >> I don't follow: What would a single bit in that bitmap mean? Any
>> >> > >> >> guest may access the page? That surely wouldn't be what we
>> >> > >> >> need.
>> >> > >> >>
>> >> > >> > 
>> >> > >> > For a host having a N pages of nvdimm, each domain will have a N 
>> >> > >> > bits
>> >> > >> > bitmap. If the m'th bit of a domain's bitmap is set, then that 
>> >> > >> > domain
>> >> > >> > has the permission to access the m'th host nvdimm page.
>> >> > >> 
>> >> > >> Which will be more overhead as soon as there are enough such
>> >> > >> domains in a system.
>> >> > >>
>> >> > > 
>> >> > > Sorry for the late reply.
>> >> > > 
>> >> > > I think we can make some optimization to reduce the space consumed by
>> >> > > the bitmap.
>> >> > > 
>> >> > > A per-domain bitmap covering the entire host NVDIMM address range is
>> >> > > wasteful especially if the actual used ranges are congregated. We may
>> >> > > take following ways to reduce its space.
>> >> > > 
>> >> > > 1) Split the per-domain bitmap into multiple sub-bitmap and each
>> >> > >    sub-bitmap covers a smaller and contiguous sub host NVDIMM address
>> >> > >    range. In the beginning, no sub-bitmap is allocated for the
>> >> > >    domain. If the access permission to a host NVDIMM page in a sub
>> >> > >    host address range is added to a domain, only the sub-bitmap for
>> >> > >    that address range is allocated for the domain. If access
>> >> > >    permissions to all host NVDIMM pages in a sub range are removed
>> >> > >    from a domain, the corresponding sub-bitmap can be freed.
>> >> > > 
>> >> > > 2) If a domain has access permissions to all host NVDIMM pages in a
>> >> > >    sub range, the corresponding sub-bitmap will be replaced by a range
>> >> > >    struct. If range structs are used to track adjacent ranges, they
>> >> > >    will be merged into one range struct. If access permissions to some
>> >> > >    pages in that sub range are removed from a domain, the range struct
>> >> > >    should be converted back to bitmap segment(s).
>> >> > > 
>> >> > > 3) Because there might be lots of above bitmap segments and range
>> >> > >    structs per-domain, we can organize them in a balanced interval
>> >> > >    tree to quickly search/add/remove an individual structure.
>> >> > > 
>> >> > > In the worst case that each sub range has non-contiguous pages
>> >> > > assigned to a domain, above solution will use all sub-bitmaps and
>> >> > > consume more space than a single bitmap because of the extra space for
>> >> > > organization. I assume that the sysadmin should be responsible to
>> >> > > ensure the host nvdimm ranges assigned to each domain as contiguous
>> >> > > and congregated as possible in order to avoid the worst case. However,
>> >> > > if the worst case does happen, xen hypervisor should refuse to assign
>> >> > > nvdimm to guest when it runs out of memory.
>> >> > 
>> >> > To be honest, this all sounds pretty unconvincing wrt not using
>> >> > existing code paths - a lot of special treatment, and hence a lot
>> >> > of things that can go (slightly) wrong.
>> >> > 
>> >> 
>> >> Well, using existing range struct to manage guest access permissions
>> >> to nvdimm could consume too much space which could not fit in either
>> >> memory or nvdimm. If the above solution looks really error-prone,
>> >> perhaps we can still come back to the existing one and restrict the
>> >> number of range structs each domain could have for nvdimm
>> >> (e.g. reserve one 4K-page per-domain for them) to make it work for
>> >> nvdimm, though it may reject nvdimm mapping that is terribly
>> >> fragmented.
>> > 
>> > Hi Jan,
>> > 
>> > Any comments for this?
>> 
>> Well, nothing new, i.e. my previous opinion on the old proposal didn't
>> change. I'm really opposed to any artificial limitations here, as I am to
>> any secondary (and hence error prone) code paths. IOW I continue
>> to think that there's no reasonable alternative to re-using the existing
>> memory management infrastructure for at least the PMEM case.
> 
> By re-using the existing memory management infrastructure, do you mean
> re-using the existing model of MMIO for passthrough PCI devices to
> handle the permission of pmem?

No, re-using struct page_info.

>> The
>> only open question remains to be where to place the control structures,
>> and I think the thresholding proposal of yours was quite sensible.
> 
> I'm little confused here. Is 'restrict the number of range structs' in
> my previous reply the 'thresholding proposal' you mean? Or it's one of
> 'artificial limitations'?

Neither. It's the decision on where to place the struct page_info
arrays needed to manage the PMEM ranges.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.