[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen
>>> On 16.02.16 at 12:14, <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Mon, 15 Feb 2016, Zhang, Haozhong wrote: >> On 02/04/16 20:24, Stefano Stabellini wrote: >> > On Thu, 4 Feb 2016, Haozhong Zhang wrote: >> > > On 02/03/16 15:22, Stefano Stabellini wrote: >> > > > On Wed, 3 Feb 2016, George Dunlap wrote: >> > > > > On 03/02/16 12:02, Stefano Stabellini wrote: >> > > > > > On Wed, 3 Feb 2016, Haozhong Zhang wrote: >> > > > > >> Or, we can make a file system on /dev/pmem0, create files on it, >> > > > > >> set >> > > > > >> the owner of those files to xen-qemuuser-domid$domid, and then >> > > > > >> pass >> > > > > >> those files to QEMU. In this way, non-root QEMU should be able to >> > > > > >> mmap those files. >> > > > > > >> > > > > > Maybe that would work. Worth adding it to the design, I would like >> > > > > > to >> > > > > > read more details on it. >> > > > > > >> > > > > > Also note that QEMU initially runs as root but drops privileges to >> > > > > > xen-qemuuser-domid$domid before the guest is started. Initially >> > > > > > QEMU >> > > > > > *could* mmap /dev/pmem0 while is still running as root, but then it >> > > > > > wouldn't work for any devices that need to be mmap'ed at run time >> > > > > > (hotplug scenario). >> > > > > >> > > > > This is basically the same problem we have for a bunch of other >> > > > > things, >> > > > > right? Having xl open a file and then pass it via qmp to qemu should >> > > > > work in theory, right? >> > > > >> > > > Is there one /dev/pmem? per assignable region? >> > > >> > > Yes. >> > > >> > > BTW, I'm wondering whether and how non-root qemu works with xl disk >> > > configuration that is going to access a host block device, e.g. >> > > disk = [ '/dev/sdb,,hda' ] >> > > If that works with non-root qemu, I may take the similar solution for >> > > pmem. >> > >> > Today the user is required to give the correct ownership and access mode >> > to the block device, so that non-root QEMU can open it. However in the >> > case of PCI passthrough, QEMU needs to mmap /dev/mem, as a consequence >> > the feature doesn't work at all with non-root QEMU >> > (http://marc.info/?l=xen-devel&m=145261763600528). >> > >> > If there is one /dev/pmem device per assignable region, then it would be >> > conceivable to change its ownership so that non-root QEMU can open it. >> > Or, better, the file descriptor could be passed by the toolstack via >> > qmp. >> >> Passing file descriptor via qmp is not enough. >> >> Let me clarify where the requirement for root/privileged permissions >> comes from. The primary workflow in my design that maps a host pmem >> region or files in host pmem region to guest is shown as below: >> (1) QEMU in Dom0 mmap the host pmem (the host /dev/pmem0 or files on >> /dev/pmem0) to its virtual address space, i.e. the guest virtual >> address space. >> (2) QEMU asks Xen hypervisor to map the host physical address, i.e. SPA >> occupied by the host pmem to a DomU. This step requires the >> translation from the guest virtual address (where the host pmem is >> mmaped in (1)) to the host physical address. The translation can be >> done by either >> (a) QEMU that parses its own /proc/self/pagemap, >> or >> (b) Xen hypervisor that does the translation by itself [1] (though >> this choice is not quite doable from Konrad's comments [2]). >> >> [1] >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00434.html >> [2] >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00606.html >> >> For 2-a, reading /proc/self/pagemap requires CAP_SYS_ADMIN capability >> since linux kernel 4.0. Furthermore, if we don't mlock the mapped host >> pmem (by adding MAP_LOCKED flag to mmap or calling mlock after mmap), >> pagemap will not contain all mappings. However, mlock may require >> privileged permission to lock memory larger than RLIMIT_MEMLOCK. Because >> mlock operates on memory, the permission to open(2) the host pmem files >> does not solve the problem and therefore passing file descriptor via qmp >> does not help. >> >> For 2-b, from Konrad's comments [2], mlock is also required and >> privileged permission may be required consequently. >> >> Note that the mapping and the address translation are done before QEMU >> dropping privileged permissions, so non-root QEMU should be able to work >> with above design until we start considering vNVDIMM hotplug (which has >> not been supported by the current vNVDIMM implementation in QEMU). In >> the hotplug case, we may let Xen pass explicit flags to QEMU to keep it >> running with root permissions. > > Are we all good with the fact that vNVDIMM hotplug won't work (unless > the user explicitly asks for it at domain creation time, which is > very unlikely otherwise she could use coldplug)? No, at least there needs to be a road towards hotplug, even if initially this may not be supported/implemented. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |