[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding osdep_xenforeignmemory_map mmap behaviour



On 24.03.22 02:42, Stefano Stabellini wrote:
I am pretty sure the reasons have to do with old x86 PV guests, so I am
CCing Juergen and Boris.


Hi,

While we've been working on the rust-vmm virtio backends on Xen we
obviously have to map guest memory info the userspace of the daemon.
However following the logic of what is going on is a little confusing.
For example in the Linux backend we have this:

   void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem,
                                    uint32_t dom, void *addr,
                                    int prot, int flags, size_t num,
                                    const xen_pfn_t arr[/*num*/], int 
err[/*num*/])
   {
       int fd = fmem->fd;
       privcmd_mmapbatch_v2_t ioctlx;
       size_t i;
       int rc;

       addr = mmap(addr, num << XC_PAGE_SHIFT, prot, flags | MAP_SHARED,
                   fd, 0);
       if ( addr == MAP_FAILED )
           return NULL;

       ioctlx.num = num;
       ioctlx.dom = dom;
       ioctlx.addr = (unsigned long)addr;
       ioctlx.arr = arr;
       ioctlx.err = err;

       rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx);

Where the fd passed down is associated with the /dev/xen/privcmd device
for issuing hypercalls on userspaces behalf. What is confusing is why
the function does it's own mmap - one would assume the passed addr would
be associated with a anonymous or file backed mmap region already that
the calling code has setup. Applying a mmap to a special device seems a
little odd.

Looking at the implementation on the kernel side it seems the mmap
handler only sets a few flags:

   static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
   {
           /* DONTCOPY is essential for Xen because copy_page_range doesn't know
            * how to recreate these mappings */
           vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTCOPY |
                            VM_DONTEXPAND | VM_DONTDUMP;
           vma->vm_ops = &privcmd_vm_ops;
           vma->vm_private_data = NULL;

           return 0;
   }

So can I confirm that the mmap of /dev/xen/privcmd is being called for
side effects? Is it so when the actual ioctl is called the correct flags
are set of the pages associated with the user space virtual address
range?

Can I confirm there shouldn't be any limitation on where and how the
userspace virtual address space is setup for the mapping in the guest
memory?

Is there a reason why this isn't done in the ioctl path itself?

For a rather long time we were using "normal" user pages for this purpose,
which were just locked into memory for doing the hypercall.

Unfortunately there have been very rare problems with that approach, as
the Linux kernel can set a user page related PTE to invalid for short
periods of time, which led to EFAULT in the hypervisor when trying to
access the hypercall data.

In Linux this can avoided only by using kernel memory, which is the
reason why the hypercall buffers are allocated and mmap()-ed through the
privcmd driver.


I'm trying to understand the differences between Xen and KVM in the API
choices here. I think the equivalent is the KVM_SET_USER_MEMORY_REGION
ioctl for KVM which brings a section of the guest physical address space
into the userspaces vaddr range.

The main difference is just that the consumer of the hypercall buffer is
NOT the kernel, but the hypervisor. In the KVM case both are the same, so
a brief period of an invalid PTE can be handled just fine in KVM, while
the Xen hypervisor has no idea that this situation will be over very soon.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.