[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC/PATCH] Improve speed of mapping guest memory into Dom0



Mats,

Your patch has been white-space damaged and will not apply.  You should
use git send-email which will do the right thing.  I've also CC'd Konrad
who is the maintainer for the Xen parts of the kernel.

On 14/11/12 11:13, Mats Petersson wrote:
> [a long, rambling commit message?]

The text as-is isn't really suitable for a commit message (too much
irrelevant stuff) and there is no suitable subject line.

> I have also found that the munmap() call used to unmap the guest memory
> from Dom0 is about 35% slower in 3.7 kernel than in the 2.6 kernel (3.8M
> cycles vs 2.8M cycles).

This performance reduction only occurs with 32-bit guests is the Xen
then traps-and-emulates both halves of the PTE write.

> I think this could be made quicker by using a
> direct write of zero rather than the compare exchange operation that is
> currently used [which traps into Xen, performs the compare & exchange] -

This is something I noticed but never got around to producing a patch.
How about this (uncomplied!) patch?

-- a/mm/memory.c
+++ b/mm/memory.c
@@ -1146,8 +1146,16 @@ again:
                                     page->index > details->last_index))
                                        continue;
                        }
-                       ptent = ptep_get_and_clear_full(mm, addr, pte,
-                                                       tlb->fullmm);
+                       /*
+                        * No need for the expensive atomic get and
+                        * clear for anonymous mappings as the dirty
+                        * and young bits are not used.
+                        */
+                       if (PageAnon(page))
+                               pte_clear(mm, addr, pte);
+                       else
+                               ptent = ptep_get_and_clear_full(mm, addr, pte,
+                                                               tlb->fullmm);
                        tlb_remove_tlb_entry(tlb, pte, addr);
                        if (unlikely(!page))
                                continue;


Now for the patch itself.  On the whole, the approach looks good and the
real-word performance improvements are nice.  Specific comments inline.

> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2542,3 +2561,77 @@ out:
>      return err;
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
> +
> +/* like xen_remap_domain_mfn_range, but does a list of mfn's, rather
> + * than the, for xen, quite useless, consecutive pages.
> + */

/*
 * Like xen_remap_domain_mfn_range(), but more efficiently handles MFNs
 * that are not contiguous (which is common for a domain's memory).
 */

> +int xen_remap_domain_mfn_list(struct vm_area_struct *vma,
> +                  unsigned long addr,
> +                  unsigned long *mfn, int nr,
> +                  int *err_ptr,
> +                  pgprot_t prot, unsigned domid)

xen_remap_domain_mfn_array() ?  It's not taking a list.

> +{
> +    struct remap_list_data rmd;
> +    struct mmu_update mmu_update[REMAP_BATCH_SIZE];

This is surprisingly large (256 bytes) but I note that the existing
xen_remap_domain_mfn_range() does the same thing so I guess it's ok.

> +    int batch;
> +    int done;
> +    unsigned long range;
> +    int err = 0;
> +
> +    if (xen_feature(XENFEAT_auto_translated_physmap))
> +        return -EINVAL;
> +
> +    prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
> +
> +    BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP |
> VM_IO)));
> +
> +    rmd.mfn = mfn;
> +    rmd.prot = prot;
> +
> +    while (nr) {
> +        batch = min(REMAP_BATCH_SIZE, nr);
> +        range = (unsigned long)batch << PAGE_SHIFT;
> +
> +        rmd.mmu_update = mmu_update;
> +        err = apply_to_page_range(vma->vm_mm, addr, range,
> +                      remap_area_mfn_list_pte_fn, &rmd);
> +        if (err)
> +        {
> +            printk("xen_remap_domain_mfn_list: apply_to_range:
> err=%d\n", err);

Stray printk?

> +            goto out;
> +        }
> +
> +        err = HYPERVISOR_mmu_update(mmu_update, batch, &done, domid);
> +        if (err < 0)
> +        {
> +            int i;
> +            /* TODO: We should remove this printk later */
> +            printk("xen_remap_domain_mfn_list: mmu_update: err=%d,
> done=%d, batch=%d\n", err, done, batch);

Yes, you should...

> +            err_ptr[done] = err;
> +
> +            /* now do the remaining part of this batch */
> +            for(i = done+1; i < batch; i++)
> +            {
> +                int tmp_err = HYPERVISOR_mmu_update(&mmu_update[i], 1,
> NULL, domid);
> +                if (tmp_err < 0)
> +                {
> +                    err_ptr[i] = tmp_err;
> +                }
> +            }

There's no need to fall back to doing it page-by-page here. You can
still batch the remainder.

> +
> +            goto out;
> +        }
> +
> +        nr -= batch;
> +        addr += range;
> +        err_ptr += batch;
> +    }
> +
> +    err = 0;
> +out:
> +
> +    xen_flush_tlb_all();

Probably not that important anymore since we would now do far fewer TLB
flushes, but this TLB flush is only needed by the PTEs being updated
were already present -- if they're all clear then TLB flush can be
omitted entirely.

> +
> +    return err;
> +}
> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_list);
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 8adb9cc..b39a7b7 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -151,6 +151,41 @@ static int traverse_pages(unsigned nelem, size_t size,
>      return ret;
>  }
> 
> +/*
> + * Similar to traverse_pages, but use each page as a "block" of
> + * data to be processed as one unit.
> + */
> +static int traverse_pages_block(unsigned nelem, size_t size,
> +                struct list_head *pos,
> +                int (*fn)(void *data, int nr, void *state),
> +                void *state)
> +{
> +    void *pagedata;
> +    unsigned pageidx;
> +    int ret = 0;
> +
> +    BUG_ON(size > PAGE_SIZE);
> +
> +    pageidx = PAGE_SIZE;
> +    pagedata = NULL;    /* hush, gcc */

What was gcc upset about?  I don't see anything it could get confused about.

> @@ -260,17 +295,17 @@ struct mmap_batch_state {
>      xen_pfn_t __user *user_mfn;
>  };
> 
> -static int mmap_batch_fn(void *data, void *state)
> +static int mmap_batch_fn(void *data, int nr, void *state)
>  {
>      xen_pfn_t *mfnp = data;
> +
>      struct mmap_batch_state *st = state;
>      int ret;
> 
> -    ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK,
> *mfnp, 1,
> -                     st->vma->vm_page_prot, st->domain);
> +    BUG_ON(nr < 0);

Is this BUG_ON() useful?

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.