[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 03/17/16 05:04, Jan Beulich wrote:
> >>> On 17.03.16 at 09:58, <haozhong.zhang@xxxxxxxxx> wrote:
> > On 03/16/16 09:23, Jan Beulich wrote:
> >> >>> On 16.03.16 at 15:55, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > On 03/16/16 08:23, Jan Beulich wrote:
> >> >> >>> On 16.03.16 at 14:55, <haozhong.zhang@xxxxxxxxx> wrote:
> >> >> > On 03/16/16 07:16, Jan Beulich wrote:
> >> >> >> And
> >> >> >> talking of fragmentation - how do you mean to track guest
> >> >> >> permissions for an unbounded number of address ranges?
> >> >> >>
> >> >> > 
> >> >> > In this case range structs in iomem_caps for NVDIMMs may consume a lot
> >> >> > of memory, so I think they are another candidate that should be put in
> >> >> > the reserved area on NVDIMM. If we only allow to grant access
> >> >> > permissions to NVDIMM page by page (rather than byte), the number of
> >> >> > range structs for each NVDIMM in the worst case is still decidable.
> >> >> 
> >> >> Of course the permission granularity is going to by pages, not
> >> >> bytes (or else we couldn't allow the pages to be mapped into
> >> >> guest address space). And the limit on the per-domain range
> >> >> sets isn't going to be allowed to be bumped significantly, at
> >> >> least not for any of the existing ones (or else you'd have to
> >> >> prove such bumping can't be abused).
> >> > 
> >> > What is that limit? the total number of range structs in per-domain
> >> > range sets? I must miss something when looking through 'case
> >> > XEN_DOMCTL_iomem_permission' of do_domctl() and didn't find that
> >> > limit, unless it means alloc_range() will fail when there are lots of
> >> > range structs.
> >> 
> >> Oh, I'm sorry, that was a different set of range sets I was
> >> thinking about. But note that excessive creation of ranges
> >> through XEN_DOMCTL_iomem_permission is not a security issue
> >> just because of XSA-77, i.e. we'd still not knowingly allow a
> >> severe increase here.
> >>
> > 
> > I didn't notice that multiple domains can all have access permission
> > to an iomem range, i.e. there can be multiple range structs for a
> > single iomem range. If range structs for NVDIMM are put on NVDIMM,
> > then there would be still a huge amount of them on NVDIMM in the worst
> > case (maximum number of domains * number of NVDIMM pages).
> > 
> > A workaround is to only allow a range of NVDIMM pages be accessed by a
> > single domain. Whenever we add the access permission of NVDIMM pages
> > to a domain, we also remove the permission from its current
> > grantee. In this way, we only need to put 'number of NVDIMM pages'
> > range structs on NVDIMM in the worst case.
> 
> But will this work? There's a reason multiple domains are permitted
> access: The domain running qemu for the guest, for example,
> needs to be able to access guest memory.
>

QEMU now only maintains ACPI tables and emulates _DSM for vNVDIMM
which both do not need to access NVDIMM pages mapped to guest.

> No matter how much you and others are opposed to this, I can't
> help myself thinking that PMEM regions should be treated like RAM
> (and hence be under full control of Xen), whereas PBLK regions
> could indeed be treated like MMIO (and hence partly be under the
> control of Dom0).
>

Hmm, making Xen has full control could at least make reserving space
on NVDIMM easier. I guess full control does not include manipulating
file systems on NVDIMM which can be still left to dom0?

Then there is another problem (which also exists in the current
design): does Xen need to emulate NVDIMM _DSM for dom0? Take the _DSM
that access label storage area (for namespace) for example:

The way Linux reserving space on pmem mode NVDIMM is to leave the
reserved space at the beginning of pmem mode NVDIMM and create a pmem
namespace which starts from the end of the reserved space. Because the
reservation information is written in the namespace in the NVDIMM
label storage area, every OS that follows the namespace spec would not
mistakenly write files in the reserved area. I prefer to the same way
if Xen is going to do the reservation. We definitely don't want dom0
to break the label storage area, so Xen seemingly needs to emulate the
corresponding _DSM functions for dom0? If so, which part, the
hypervisor or the toolstack, should do the emulation?

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.