[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 03/16/16 07:16, Jan Beulich wrote:
> >>> On 16.03.16 at 13:55, <haozhong.zhang@xxxxxxxxx> wrote:
> > Hi Jan and Konrad,
> > 
> > On 03/04/16 15:30, Haozhong Zhang wrote:
> >> Suddenly realize it's unnecessary to let QEMU get SPA ranges of NVDIMM
> >> or files on NVDIMM. We can move that work to toolstack and pass SPA
> >> ranges got by toolstack to qemu. In this way, no privileged operations
> >> (mmap/mlock/...) are needed in QEMU and non-root QEMU should be able to
> >> work even with vNVDIMM hotplug in future.
> >> 
> > 
> > As I'm going to let toolstack to get NVDIMM SPA ranges. This can be
> > done via dom0 kernel interface and xen hypercalls, and can be
> > implemented in different ways. I'm wondering which of the following
> > ones is preferred by xen.
> > 
> > 1. Given
> >     * a file descriptor of either a NVDIMM device or a file on NVDIMM, and
> >     * domain id and guest MFN where vNVDIMM is going to be.
> >    xen toolstack (1) gets it SPA ranges via dom0 kernel interface
> >    (e.g. sysfs and ioctl FIEMAP), and (2) calls a hypercall to map
> >    above SPA ranges to the given guest MFN of the given domain.
> > 
> > 2. Or, given the same inputs, we may combine above two steps into a new
> >    dom0 system call that (1) gets the SPA ranges, (2) calls xen
> >    hypercall to map SPA ranges, and, one step further, (3) returns SPA
> >    ranges to userspace (because QEMU needs these addresses to build ACPI).
> 
> DYM GPA here? Qemu should hardly have a need for SPA when
> wanting to build ACPI tables for the guest.
>

Oh, it should be GPA for QEMU and (3) is not needed.

> > The first way does not need to modify dom0 linux kernel, while the
> > second requires a new system call. I'm not sure whether xen toolstack
> > as a userspace program is considered to be safe to pass the host physical
> > address to hypervisor. If not, maybe the second one is better?
> 
> As long as the passing of physical addresses follows to model
> of MMIO for passed through PCI devices, I don't think there's
> problem with the tool stack bypassing the Dom0 kernel. So it
> really all depends on how you make sure that the guest won't
> get to see memory it has no permission to access.
>

So the toolstack should first use XEN_DOMCTL_iomem_permission to grant
permissions to the guest and then call XEN_DOMCTL_memory_mapping for
the mapping.

> Which reminds me: When considering a file on NVDIMM, how
> are you making sure the mapping of the file to disk (i.e.
> memory) blocks doesn't change while the guest has access
> to it, e.g. due to some defragmentation going on?

The current linux kernel 4.5 has an experimental "raw device dax
support" (enabled by removing "depends on BROKEN" from "config
BLK_DEV_DAX") which can guarantee the consistent mapping. The driver
developers are going to make it non-broken in linux kernel 4.6.

> And
> talking of fragmentation - how do you mean to track guest
> permissions for an unbounded number of address ranges?
>

In this case range structs in iomem_caps for NVDIMMs may consume a lot
of memory, so I think they are another candidate that should be put in
the reserved area on NVDIMM. If we only allow to grant access
permissions to NVDIMM page by page (rather than byte), the number of
range structs for each NVDIMM in the worst case is still decidable.

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.