[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen

> > > >  Open: It seems no system call/ioctl is provided by Linux kernel to
> > > >        get the physical address from a virtual address.
> > > >        /proc/<qemu_pid>/pagemap provides information of mapping from
> > > >        VA to PA. Is it an acceptable solution to let QEMU parse this
> > > >        file to get the physical address?
> > > 
> > > Does it work in a non-root scenario?
> > >
> > 
> > Seemingly no, according to Documentation/vm/pagemap.txt in Linux kernel:
> > | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
> > | In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
> > | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
> > | Reason: information about PFNs helps in exploiting Rowhammer 
> > vulnerability.

Ah right.
> >
> > A possible alternative is to add a new hypercall similar to
> > XEN_DOMCTL_memory_mapping but receiving virtual address as the address
> > parameter and translating to machine address in the hypervisor.
> That might work.

That won't work.

This is a userspace VMA - which means the once the ioctl is done we swap
to kernel virtual addresses. Now we may know that the prior cr3 has the
userspace virtual address and walk it down - but what if the domain
that is doing this is PVH? (or HVM) - the cr3 of userspace is tucked somewhere
inside the kernel.

Which means this hypercall would need to know the Linux kernel task structure
to find this.

May I propose another solution - an stacking driver (similar to loop). You
setup it up (ioctl /dev/pmem0/guest.img, get some /dev/mapper/guest.img 
Then mmap the /dev/mapper/guest.img - all of the operations are the same - 
it may have an extra ioctl - get_pfns - which would provide the data in similar
form to pagemap.txt.

But folks will then ask - why don't you just use pagemap? Could the pagemap
have an extra security capability check? One that can be set for

> > > >  Open: For a large pmem, mmap(2) is very possible to not map all SPA
> > > >        occupied by pmem at the beginning, i.e. QEMU may not be able to
> > > >        get all SPA of pmem from buf (in virtual address space) when
> > > >        calling XEN_DOMCTL_memory_mapping.
> > > >        Can mmap flag MAP_LOCKED or mlock(2) be used to enforce the
> > > >        entire pmem being mmaped?
> > > 
> > > Ditto
> > >
> > 
> > No. If I take the above alternative for the first open, maybe the new
> > hypercall above can inject page faults into dom0 for the unmapped
> > virtual address so as to enforce dom0 Linux to create the page
> > mapping.

Ugh. That sounds hacky. And you wouldn't neccessarily be safe.
Imagine that the system admin decides to defrag the /dev/pmem filesystem.
Or move the files (disk images) around. If they do that - we may
still have the guest mapped to system addresses which may contain filesystem
metadata now, or a different guest image. We MUST mlock or lock the file
during the duration of the guest.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.