[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug: privcmd failure in mmaping 160+ pages



Hi Konrad, David and everybody else,

I find a privcmd bug that no more than 160 pages can be mmap by
IOCTL_PRIVCMD_MMAPBATCH and IOCTL_PRIVCMD_MMAPBATCH_V2.

I have started a thread (see below) at the xen-user list, and then
discussed with Mr. Ian Campbell, who finally recommended talking to
the xen-devel list and CC'ing you.

I appreciate your comments.

You know, this might have been me. Or in the process of being fixed by a patch I just posted.

As you may know by now, xc_map_foreign* first sets up a region of virtual memory with an mmap call, and then issues the mapping ioctl to populate the region.

Let's say for the sake of argument that the mapping operation is broken up into pieces. Current checks in the implementation of PRIVCMD_MMAPBATCH* prevent you from populating the memory region piece-meal.

So: can you try the patch I pasted below?

If that doesn't fix it, then you'll need to give us the output of those printks.

Thanks
Andres

[PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*.

When a foreign mapper attempts to map guest frames that are paged out,
the mapper receives an ENOENT response and will have to try again
while a helper process pages the target frame back in.

Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries
of mapping calls.

Signed-off-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
---
drivers/xen/privcmd.c |   32 +++++++++++++++++++++++++++-----
1 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index f8e5dd7..44a26c6 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -43,9 +43,12 @@ MODULE_LICENSE("GPL");

#define PRIV_VMA_LOCKED ((void *)1)

-#ifndef HAVE_ARCH_PRIVCMD_MMAP
static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma);
-#endif
+
+static int privcmd_enforce_singleshot_mapping_granular(
+               struct vm_area_struct *vma,
+               unsigned long addr,
+               unsigned long nr_pages);

static long privcmd_ioctl_hypercall(void __user *udata)
{
@@ -422,9 +425,9 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
vma = find_vma(mm, m.addr);
if (!vma ||
    vma->vm_ops != &privcmd_vm_ops ||
-     (m.addr != vma->vm_start) ||
-     ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) ||
-     !privcmd_enforce_singleshot_mapping(vma)) {
+     (m.addr < vma->vm_start) ||
+     ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end) ||
+     !privcmd_enforce_singleshot_mapping_granular(vma, m.addr, nr_pages)) {
up_write(&mm->mmap_sem);
ret = -EINVAL;
goto out;
@@ -540,11 +543,30 @@ static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
return 0;
}

+/* For MMAP */
static int privcmd_enforce_singleshot_mapping(struct vm_area_struct *vma)
{
return !cmpxchg(&vma->vm_private_data, NULL, PRIV_VMA_LOCKED);
}

+/* For MMAPBATCH*. This allows asserting the singleshot mapping
+ * on a per pfn/pte basis. Mapping calls that fail with ENOENT
+ * can be then retried until success. */
+static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page,
+                 unsigned long addr, void *data)
+{
+ return pte_none(*pte) ? 0 : -EBUSY;
+}
+
+static int privcmd_enforce_singleshot_mapping_granular(
+            struct vm_area_struct *vma,
+            unsigned long addr,
+            unsigned long nr_pages)
+{
+ return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT,
+               enforce_singleshot_mapping_fn, NULL) == 0;
+}
+
const struct file_operations xen_privcmd_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = privcmd_ioctl,
-- 
1.7.1


Thanks,

Guanglin


---------- Forwarded message ----------
From: Ian Campbell <Ian.Campbell@xxxxxxxxx>
Date: 2013/7/30
Subject: Re: [Xen-users] BUG : privcmd mmap 160+ pages
To: Guanglin Xu <mzguanglin@xxxxxxxxx>
??? "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>


On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote:
2013/7/30 Ian Campbell <Ian.Campbell@xxxxxxxxx>:
On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote:
Hi all,

I just find a xc_map_foreign_range() problem in Xen.

xc_map_foreign_range(), an API of libxc backed by privcmd - a xen
kernel module, can be used to mmap guest VM memory into dom0. However,
if we mmap more than 160 pages, we'll fail.

Inside xc_map_foreign_range(), it uses ioctrl to communicate with
privcmd. There are 2 ioctl commands, the one is
IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is
IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0
(success), but the mmapings are fail after mmaping 160 pages.

Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH
was the only one ioctl command.  rc = ioctl(fd,
IOCTL_PRIVCMD_MMAPBATCH, &ioctlx).  After mapping 160 pages, the
subsequent invoking of ioctl would
set ioctlx.arr[] items to 140737344202616, overwriting the original
pfn number.

And then, I updated my Linux kernel to 3.8 so as to test
IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd,
IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking
of ioctl would set ioctls.err[] items to EINVAL.

Although I have inserted printk() in privcmd.c to track its execution
flow, the result showed a look-like complete path, which is quite
weird. I have no idea what happened in privcmd.

Can anyone figure out this problem?

I wouldn't be all that surprised if there was a hardcoded batch size
limit somewhere in either libxc or the privcmd driver.

Hi Ian,

Thank you very much for your reply.

I can confirm that it's the problem of privcmd, because I have debug
libxc and narrowed the problem to ioctl(), where privcmd would set
ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] (
IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation.


If you map you memory in batch of e.g. 128 pages does it all work OK?

Yes. For example, [0-127] succeeds. However, the subsequent [128-255]
would fail because the size of  the whole region has exceeded 160
pages.

That's quite interesting.


If you want to get to the bottom of the 160 page limit you'll probably
have to trace through the code looking for hardcoded sizes or limits on
sizes (e.g. of arrays) or perhaps integer indexes etc which are too
small and are overflowing.

160 * the size of the various structs involved doesn't look to be all
that interesting (i.e. just over a page size boundary or something), but
that's the sort of direction I would start by looking in.

If you can't spot it by eye then you'll likely have to instrument the
code paths with prints to try and track the progress of the initially
supplied buffer through to the hypercall etc.

Yes. I have been debuging libxc and instrumenting privcmd, but I
couldn't find the "hardcode" or other limitation codes.

Codes in libxc is easizer to trace, but in privcmd it is hard for me
who is in lack of kernel dev experience. I couldn't even find where
privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[]
items while the execution path of privcmd_ioctl() seems quite complete
by use of printk().

Do you have idea how privcmd can set ioctlx.err[] in another way?


I'm not familiar with this code. It sounds like this is most probably a
kernel bug, I'd recommend taking it to the xen-devel list, perhaps CC
Konrad Wilk and David Vrabel.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.