|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet
On May 18, 2014 10:57:56 PM EDT, Bob Liu <lliubbo@xxxxxxxxx> wrote:
>Because of page scrub, it's very slow to destroy a domain with large
>memory.
>It took around 10 minutes when destroy a guest of nearly 1 TB of
>memory.
>
>[root@ca-test111 ~]# time xm des 5
>real 10m51.582s
>user 0m0.115s
>sys 0m0.039s
>[root@ca-test111 ~]#
>
>Use perf we can see what happened, thanks for Boris's help and provide
>this
>useful tool for xen.
>[root@x4-4 bob]# perf report
>22.32% xl [xen.syms] [k] page_get_owner_and_reference
> 20.82% xl [xen.syms] [k] relinquish_memory
> 20.63% xl [xen.syms] [k] put_page
> 17.10% xl [xen.syms] [k] scrub_one_page
> 4.74% xl [xen.syms] [k] unmap_domain_page
> 2.24% xl [xen.syms] [k] get_page
> 1.49% xl [xen.syms] [k] free_heap_pages
> 1.06% xl [xen.syms] [k] _spin_lock
> 0.78% xl [xen.syms] [k] __put_page_type
> 0.75% xl [xen.syms] [k] map_domain_page
> 0.57% xl [xen.syms] [k] free_page_type
> 0.52% xl [xen.syms] [k] is_iomem_page
> 0.42% xl [xen.syms] [k] free_domheap_pages
> 0.31% xl [xen.syms] [k] put_page_from_l1e
> 0.27% xl [xen.syms] [k] check_lock
> 0.27% xl [xen.syms] [k] __mfn_valid
>
>This patch try to delay scrub_one_page() to a tasklet which will be
>scheduled on
>all online physical cpus, so that it's much faster to return from
>'xl/xm
>destroy xxx'.
Thank you digging in this. However tasklets do not run in parallel. That is
they are only executed on one CPU.
>
>Tested on a guest with 30G memory.
>Before this patch:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m16.014s
>user 0m0.010s
>sys 0m13.976s
>[root@x4-4 bob]#
>
>After:
>[root@x4-4 bob]# time xl des PV-30G
>
>real 0m3.581s
>user 0m0.003s
>sys 0m1.554s
>[root@x4-4 bob]#
>
>The destroy time reduced from 16s to 3s.
Right. By moving the scrubbing from this function to a task let.
>
>Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx>
>---
> xen/common/page_alloc.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
>diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>index 601319c..2ca59a1 100644
>--- a/xen/common/page_alloc.c
>+++ b/xen/common/page_alloc.c
>@@ -79,6 +79,10 @@ PAGE_LIST_HEAD(page_offlined_list);
> /* Broken page list, protected by heap_lock. */
> PAGE_LIST_HEAD(page_broken_list);
>
>+PAGE_LIST_HEAD(page_scrub_list);
>+static DEFINE_SPINLOCK(scrub_list_spinlock);
>+static struct tasklet scrub_page_tasklet;
>+
> /*************************
> * BOOT-TIME ALLOCATOR
> */
>@@ -1417,6 +1421,25 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> #endif
>
>
>+static void scrub_free_pages(unsigned long unuse)
>+{
>+ struct page_info *pg;
>+
>+ for ( ; ; )
>+ {
>+ while ( page_list_empty(&page_scrub_list) )
>+ cpu_relax();
>+
>+ spin_lock(&scrub_list_spinlock);
>+ pg = page_list_remove_head(&page_scrub_list);
>+ spin_unlock(&scrub_list_spinlock);
>+ if (pg)
>+ {
>+ scrub_one_page(pg);
>+ free_heap_pages(pg, 0);
>+ }
>+ }
I fear that means you added an work item that can run for a very long time and
cause security issues (DoS to guests). The VMEXIT code for example checks to
see if a softirq is to run and will run any tasklets. Which means you could be
running this scrubbing now in another guest context and cause it to be delayed
significantly.
A couple of ideas;
- have per cpu tasklets for nr_online_cpus and they all can try to do some
batched work and if any anything is left reschedule themselves.
- if a worker detects that it is not running within the idle domain context
then schedule itself for later
- perhaps also look at having an per-cpu scrubbing list. And then feed them
per node list ?
Thanks!
>+}
>
> /*************************
> * DOMAIN-HEAP SUB-ALLOCATOR
>@@ -1425,6 +1448,7 @@ void free_xenheap_pages(void *v, unsigned int
>order)
> void init_domheap_pages(paddr_t ps, paddr_t pe)
> {
> unsigned long smfn, emfn;
>+ unsigned int cpu;
>
> ASSERT(!in_irq());
>
>@@ -1435,6 +1459,9 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
> return;
>
> init_heap_pages(mfn_to_page(smfn), emfn - smfn);
>+ tasklet_init(&scrub_page_tasklet, scrub_free_pages, 0);
>+ for_each_online_cpu(cpu)
>+ tasklet_schedule_on_cpu(&scrub_page_tasklet, cpu);
> }
>
>
>@@ -1564,8 +1591,17 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
> * domain has died we assume responsibility for erasure.
> */
> if ( unlikely(d->is_dying) )
>+ {
>+ /*
>+ * Add page to page_scrub_list to speed up domain destroy,
>those
>+ * pages will be zeroed later by scrub_page_tasklet.
>+ */
>+ spin_lock(&scrub_list_spinlock);
> for ( i = 0; i < (1 << order); i++ )
>- scrub_one_page(&pg[i]);
>+ page_list_add_tail(&pg[i], &page_scrub_list);
>+ spin_unlock(&scrub_list_spinlock);
>+ goto out;
>+ }
>
> free_heap_pages(pg, order);
> }
>@@ -1583,6 +1619,7 @@ void free_domheap_pages(struct page_info *pg,
>unsigned int order)
> drop_dom_ref = 0;
> }
>
>+out:
> if ( drop_dom_ref )
> put_domain(d);
> }
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |