[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen

On 02/04/16 20:24, Stefano Stabellini wrote:
> On Thu, 4 Feb 2016, Haozhong Zhang wrote:
> > On 02/03/16 15:22, Stefano Stabellini wrote:
> > > On Wed, 3 Feb 2016, George Dunlap wrote:
> > > > On 03/02/16 12:02, Stefano Stabellini wrote:
> > > > > On Wed, 3 Feb 2016, Haozhong Zhang wrote:
> > > > >> Or, we can make a file system on /dev/pmem0, create files on it, set
> > > > >> the owner of those files to xen-qemuuser-domid$domid, and then pass
> > > > >> those files to QEMU. In this way, non-root QEMU should be able to
> > > > >> mmap those files.
> > > > >
> > > > > Maybe that would work. Worth adding it to the design, I would like to
> > > > > read more details on it.
> > > > >
> > > > > Also note that QEMU initially runs as root but drops privileges to
> > > > > xen-qemuuser-domid$domid before the guest is started. Initially QEMU
> > > > > *could* mmap /dev/pmem0 while is still running as root, but then it
> > > > > wouldn't work for any devices that need to be mmap'ed at run time
> > > > > (hotplug scenario).
> > > >
> > > > This is basically the same problem we have for a bunch of other things,
> > > > right?  Having xl open a file and then pass it via qmp to qemu should
> > > > work in theory, right?
> > >
> > > Is there one /dev/pmem? per assignable region?
> > 
> > Yes.
> > 
> > BTW, I'm wondering whether and how non-root qemu works with xl disk
> > configuration that is going to access a host block device, e.g.
> >      disk = [ '/dev/sdb,,hda' ]
> > If that works with non-root qemu, I may take the similar solution for
> > pmem.
> Today the user is required to give the correct ownership and access mode
> to the block device, so that non-root QEMU can open it. However in the
> case of PCI passthrough, QEMU needs to mmap /dev/mem, as a consequence
> the feature doesn't work at all with non-root QEMU
> (http://marc.info/?l=xen-devel&m=145261763600528).
> If there is one /dev/pmem device per assignable region, then it would be
> conceivable to change its ownership so that non-root QEMU can open it.
> Or, better, the file descriptor could be passed by the toolstack via
> qmp.

Passing file descriptor via qmp is not enough.

Let me clarify where the requirement for root/privileged permissions
comes from. The primary workflow in my design that maps a host pmem
region or files in host pmem region to guest is shown as below:
 (1) QEMU in Dom0 mmap the host pmem (the host /dev/pmem0 or files on
     /dev/pmem0) to its virtual address space, i.e. the guest virtual
     address space.
 (2) QEMU asks Xen hypervisor to map the host physical address, i.e. SPA
     occupied by the host pmem to a DomU. This step requires the
     translation from the guest virtual address (where the host pmem is
     mmaped in (1)) to the host physical address. The translation can be
     done by either
    (a) QEMU that parses its own /proc/self/pagemap,
    (b) Xen hypervisor that does the translation by itself [1] (though
        this choice is not quite doable from Konrad's comments [2]).

[1] http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00434.html
[2] http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00606.html

For 2-a, reading /proc/self/pagemap requires CAP_SYS_ADMIN capability
since linux kernel 4.0. Furthermore, if we don't mlock the mapped host
pmem (by adding MAP_LOCKED flag to mmap or calling mlock after mmap),
pagemap will not contain all mappings. However, mlock may require
privileged permission to lock memory larger than RLIMIT_MEMLOCK. Because
mlock operates on memory, the permission to open(2) the host pmem files
does not solve the problem and therefore passing file descriptor via qmp
does not help.

For 2-b, from Konrad's comments [2], mlock is also required and
privileged permission may be required consequently.

Note that the mapping and the address translation are done before QEMU
dropping privileged permissions, so non-root QEMU should be able to work
with above design until we start considering vNVDIMM hotplug (which has
not been supported by the current vNVDIMM implementation in QEMU). In
the hotplug case, we may let Xen pass explicit flags to QEMU to keep it
running with root permissions.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.