[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



>>> On 16.02.16 at 12:14, <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Mon, 15 Feb 2016, Zhang, Haozhong wrote:
>> On 02/04/16 20:24, Stefano Stabellini wrote:
>> > On Thu, 4 Feb 2016, Haozhong Zhang wrote:
>> > > On 02/03/16 15:22, Stefano Stabellini wrote:
>> > > > On Wed, 3 Feb 2016, George Dunlap wrote:
>> > > > > On 03/02/16 12:02, Stefano Stabellini wrote:
>> > > > > > On Wed, 3 Feb 2016, Haozhong Zhang wrote:
>> > > > > >> Or, we can make a file system on /dev/pmem0, create files on it, 
>> > > > > >> set
>> > > > > >> the owner of those files to xen-qemuuser-domid$domid, and then 
>> > > > > >> pass
>> > > > > >> those files to QEMU. In this way, non-root QEMU should be able to
>> > > > > >> mmap those files.
>> > > > > >
>> > > > > > Maybe that would work. Worth adding it to the design, I would like 
>> > > > > > to
>> > > > > > read more details on it.
>> > > > > >
>> > > > > > Also note that QEMU initially runs as root but drops privileges to
>> > > > > > xen-qemuuser-domid$domid before the guest is started. Initially 
>> > > > > > QEMU
>> > > > > > *could* mmap /dev/pmem0 while is still running as root, but then it
>> > > > > > wouldn't work for any devices that need to be mmap'ed at run time
>> > > > > > (hotplug scenario).
>> > > > >
>> > > > > This is basically the same problem we have for a bunch of other 
>> > > > > things,
>> > > > > right?  Having xl open a file and then pass it via qmp to qemu should
>> > > > > work in theory, right?
>> > > >
>> > > > Is there one /dev/pmem? per assignable region?
>> > > 
>> > > Yes.
>> > > 
>> > > BTW, I'm wondering whether and how non-root qemu works with xl disk
>> > > configuration that is going to access a host block device, e.g.
>> > >      disk = [ '/dev/sdb,,hda' ]
>> > > If that works with non-root qemu, I may take the similar solution for
>> > > pmem.
>> >  
>> > Today the user is required to give the correct ownership and access mode
>> > to the block device, so that non-root QEMU can open it. However in the
>> > case of PCI passthrough, QEMU needs to mmap /dev/mem, as a consequence
>> > the feature doesn't work at all with non-root QEMU
>> > (http://marc.info/?l=xen-devel&m=145261763600528).
>> > 
>> > If there is one /dev/pmem device per assignable region, then it would be
>> > conceivable to change its ownership so that non-root QEMU can open it.
>> > Or, better, the file descriptor could be passed by the toolstack via
>> > qmp.
>> 
>> Passing file descriptor via qmp is not enough.
>> 
>> Let me clarify where the requirement for root/privileged permissions
>> comes from. The primary workflow in my design that maps a host pmem
>> region or files in host pmem region to guest is shown as below:
>>  (1) QEMU in Dom0 mmap the host pmem (the host /dev/pmem0 or files on
>>      /dev/pmem0) to its virtual address space, i.e. the guest virtual
>>      address space.
>>  (2) QEMU asks Xen hypervisor to map the host physical address, i.e. SPA
>>      occupied by the host pmem to a DomU. This step requires the
>>      translation from the guest virtual address (where the host pmem is
>>      mmaped in (1)) to the host physical address. The translation can be
>>      done by either
>>     (a) QEMU that parses its own /proc/self/pagemap,
>>      or
>>     (b) Xen hypervisor that does the translation by itself [1] (though
>>         this choice is not quite doable from Konrad's comments [2]).
>> 
>> [1] 
>> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00434.html 
>> [2] 
>> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00606.html 
>> 
>> For 2-a, reading /proc/self/pagemap requires CAP_SYS_ADMIN capability
>> since linux kernel 4.0. Furthermore, if we don't mlock the mapped host
>> pmem (by adding MAP_LOCKED flag to mmap or calling mlock after mmap),
>> pagemap will not contain all mappings. However, mlock may require
>> privileged permission to lock memory larger than RLIMIT_MEMLOCK. Because
>> mlock operates on memory, the permission to open(2) the host pmem files
>> does not solve the problem and therefore passing file descriptor via qmp
>> does not help.
>> 
>> For 2-b, from Konrad's comments [2], mlock is also required and
>> privileged permission may be required consequently.
>> 
>> Note that the mapping and the address translation are done before QEMU
>> dropping privileged permissions, so non-root QEMU should be able to work
>> with above design until we start considering vNVDIMM hotplug (which has
>> not been supported by the current vNVDIMM implementation in QEMU). In
>> the hotplug case, we may let Xen pass explicit flags to QEMU to keep it
>> running with root permissions.
> 
> Are we all good with the fact that vNVDIMM hotplug won't work (unless
> the user explicitly asks for it at domain creation time, which is
> very unlikely otherwise she could use coldplug)?

No, at least there needs to be a road towards hotplug, even if
initially this may not be supported/implemented.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.