[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 03/17/16 06:59, Jan Beulich wrote:
> >>> On 17.03.16 at 13:44, <haozhong.zhang@xxxxxxxxx> wrote:
> > On 03/17/16 05:04, Jan Beulich wrote:
> >> >>> On 17.03.16 at 09:58, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > On 03/16/16 09:23, Jan Beulich wrote:
> >> >> >>> On 16.03.16 at 15:55, <haozhong.zhang@xxxxxxxxx> wrote:
> >> >> > On 03/16/16 08:23, Jan Beulich wrote:
> >> >> >> >>> On 16.03.16 at 14:55, <haozhong.zhang@xxxxxxxxx> wrote:
> >> >> >> > On 03/16/16 07:16, Jan Beulich wrote:
> >> >> >> >> And
> >> >> >> >> talking of fragmentation - how do you mean to track guest
> >> >> >> >> permissions for an unbounded number of address ranges?
> >> >> >> >>
> >> >> >> > 
> >> >> >> > In this case range structs in iomem_caps for NVDIMMs may consume a 
> >> >> >> > lot
> >> >> >> > of memory, so I think they are another candidate that should be 
> >> >> >> > put in
> >> >> >> > the reserved area on NVDIMM. If we only allow to grant access
> >> >> >> > permissions to NVDIMM page by page (rather than byte), the number 
> >> >> >> > of
> >> >> >> > range structs for each NVDIMM in the worst case is still decidable.
> >> >> >> 
> >> >> >> Of course the permission granularity is going to by pages, not
> >> >> >> bytes (or else we couldn't allow the pages to be mapped into
> >> >> >> guest address space). And the limit on the per-domain range
> >> >> >> sets isn't going to be allowed to be bumped significantly, at
> >> >> >> least not for any of the existing ones (or else you'd have to
> >> >> >> prove such bumping can't be abused).
> >> >> > 
> >> >> > What is that limit? the total number of range structs in per-domain
> >> >> > range sets? I must miss something when looking through 'case
> >> >> > XEN_DOMCTL_iomem_permission' of do_domctl() and didn't find that
> >> >> > limit, unless it means alloc_range() will fail when there are lots of
> >> >> > range structs.
> >> >> 
> >> >> Oh, I'm sorry, that was a different set of range sets I was
> >> >> thinking about. But note that excessive creation of ranges
> >> >> through XEN_DOMCTL_iomem_permission is not a security issue
> >> >> just because of XSA-77, i.e. we'd still not knowingly allow a
> >> >> severe increase here.
> >> >>
> >> > 
> >> > I didn't notice that multiple domains can all have access permission
> >> > to an iomem range, i.e. there can be multiple range structs for a
> >> > single iomem range. If range structs for NVDIMM are put on NVDIMM,
> >> > then there would be still a huge amount of them on NVDIMM in the worst
> >> > case (maximum number of domains * number of NVDIMM pages).
> >> > 
> >> > A workaround is to only allow a range of NVDIMM pages be accessed by a
> >> > single domain. Whenever we add the access permission of NVDIMM pages
> >> > to a domain, we also remove the permission from its current
> >> > grantee. In this way, we only need to put 'number of NVDIMM pages'
> >> > range structs on NVDIMM in the worst case.
> >> 
> >> But will this work? There's a reason multiple domains are permitted
> >> access: The domain running qemu for the guest, for example,
> >> needs to be able to access guest memory.
> >>
> > 
> > QEMU now only maintains ACPI tables and emulates _DSM for vNVDIMM
> > which both do not need to access NVDIMM pages mapped to guest.
> 
> For one - this was only an example. And then - iirc qemu keeps
> mappings of certain guest RAM ranges. If I'm remembering this
> right, then why would it be excluded that it also may need
> mappings of guest NVDIMM?
>

QEMU keeps mappings of guest memory because (1) that mapping is
created by itself, and/or (2) certain device emulation needs to access
the guest memory. But for vNVDIMM, I'm going to move the creation of
its mappings out of qemu to toolstack and vNVDIMM in QEMU does not
access vNVDIMM pages mapped to guest, so it's not necessary to let
qemu keeps vNVDIMM mappings.

> >> No matter how much you and others are opposed to this, I can't
> >> help myself thinking that PMEM regions should be treated like RAM
> >> (and hence be under full control of Xen), whereas PBLK regions
> >> could indeed be treated like MMIO (and hence partly be under the
> >> control of Dom0).
> >>
> > 
> > Hmm, making Xen has full control could at least make reserving space
> > on NVDIMM easier. I guess full control does not include manipulating
> > file systems on NVDIMM which can be still left to dom0?
> > 
> > Then there is another problem (which also exists in the current
> > design): does Xen need to emulate NVDIMM _DSM for dom0? Take the _DSM
> > that access label storage area (for namespace) for example:
> > 
> > The way Linux reserving space on pmem mode NVDIMM is to leave the
> > reserved space at the beginning of pmem mode NVDIMM and create a pmem
> > namespace which starts from the end of the reserved space. Because the
> > reservation information is written in the namespace in the NVDIMM
> > label storage area, every OS that follows the namespace spec would not
> > mistakenly write files in the reserved area. I prefer to the same way
> > if Xen is going to do the reservation. We definitely don't want dom0
> > to break the label storage area, so Xen seemingly needs to emulate the
> > corresponding _DSM functions for dom0? If so, which part, the
> > hypervisor or the toolstack, should do the emulation?
> 
> I don't think I can answer all but the very last point: Of course this
> can't be done in the tool stack, since afaict the Dom0 kernel will
> want to evaluate _DSM before the tool stack even runs.
>

Or, we could modify dom0 kernel to just use the label storage area as is
and does not modify it. Can xen hypervisor trust dom0 kernel in this aspect?

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.