[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH v2] xen: free_domheap_pages: delay page scrub to idle loop
On Tue, May 20, 2014 at 10:15:31AM +0800, Bob Liu wrote: > Because of page scrub, it's very slow to destroy a domain with large > memory. > It took around 10 minutes when destroy a guest of nearly 1 TB of memory. > > [root@ca-test111 ~]# time xm des 5 > real 10m51.582s > user 0m0.115s > sys 0m0.039s > [root@ca-test111 ~]# > > There are two meanings to improve this situation. > 1. Delay the page scrub in free_domheap_pages(), so that 'xl destroy xxx' can > return earlier. > > 2. But the real scrub time doesn't get improved a lot, we should consider put > the > scrub job on all idle cpus in parallel. An obvious solution is add page to > a global list during free_domheap_pages(), and then whenever a cpu enter > idle_loop() it will try to isolate a page and scrub/free it. > Unfortunately this solution didn't work as expected in my testing, because > introduce a global list which also means we need a lock to protect that list. > The cost is too heavy! You can introduce a per-cpu list which does not need a global lock. The problem is with insertion of items in it - that would require an IPI which would then need to take the global lock and populate the local CPU list. Interestingly, this is what I had been working on to convert the tasklets in per-cpu tasklets. > > So I use a percpu scrub page list in this patch, the tradeoff is we may not > use > all idle cpus. It depends on free_domheap_pages() runs on which cpu. > > Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> > --- > xen/arch/x86/domain.c | 1 + > xen/common/page_alloc.c | 32 +++++++++++++++++++++++++++++++- > xen/include/xen/mm.h | 1 + > 3 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 6fddd4c..f3f1260 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -119,6 +119,7 @@ static void idle_loop(void) > (*pm_idle)(); I would actually do it _right_ before we call the pm_idle. As in right before we go in C states instead of after we have been woken up - as that means: a) the timer expires, so a guest needs to be woken up - and we do not want to take use its timeslice to scrub some other guest memory. b). an interrupt that needs to be processed. (do_IRQ gets called, does it stuff, setups the right softirq bits and then exits which resumes). > do_tasklet(); > do_softirq(); > + scrub_free_pages(); > } > } > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index 601319c..b2a0fc5 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -79,6 +79,8 @@ PAGE_LIST_HEAD(page_offlined_list); > /* Broken page list, protected by heap_lock. */ > PAGE_LIST_HEAD(page_broken_list); > > +DEFINE_PER_CPU(struct page_list_head , page_scrub_list); > + > /************************* > * BOOT-TIME ALLOCATOR > */ > @@ -633,6 +635,9 @@ static struct page_info *alloc_heap_pages( > goto found; > } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */ > > + if ( scrub_free_pages() ) > + continue; > + > if ( memflags & MEMF_exact_node ) > goto not_found; > > @@ -1417,6 +1422,23 @@ void free_xenheap_pages(void *v, unsigned int order) > #endif > > > +unsigned long scrub_free_pages(void) > +{ > + struct page_info *pg; > + unsigned long nr_scrubed = 0; > + > + /* Scrub around 400M memory every time */ Could you mention why 400M? > + while ( nr_scrubed < 100000 ) > + { > + pg = page_list_remove_head( &this_cpu(page_scrub_list) ); > + if (!pg) > + break; > + scrub_one_page(pg); > + free_heap_pages(pg, 0); > + nr_scrubed++; > + } > + return nr_scrubed; > +} > > /************************* > * DOMAIN-HEAP SUB-ALLOCATOR > @@ -1564,8 +1586,15 @@ void free_domheap_pages(struct page_info *pg, unsigned > int order) > * domain has died we assume responsibility for erasure. > */ > if ( unlikely(d->is_dying) ) > + { > + /* > + * Add page to page_scrub_list to speed up domain destroy, those > + * pages will be zeroed later by scrub_page_tasklet. > + */ > for ( i = 0; i < (1 << order); i++ ) > - scrub_one_page(&pg[i]); > + page_list_add_tail( &pg[i], &this_cpu(page_scrub_list) ); > + goto out; > + } > > free_heap_pages(pg, order); > } > @@ -1583,6 +1612,7 @@ void free_domheap_pages(struct page_info *pg, unsigned > int order) > drop_dom_ref = 0; > } > > +out: > if ( drop_dom_ref ) > put_domain(d); > } > diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h > index b183189..3560335 100644 > --- a/xen/include/xen/mm.h > +++ b/xen/include/xen/mm.h > @@ -355,6 +355,7 @@ static inline unsigned int get_order_from_pages(unsigned > long nr_pages) > } > > void scrub_one_page(struct page_info *); > +unsigned long scrub_free_pages(void); > > int xenmem_add_to_physmap_one(struct domain *d, unsigned int space, > domid_t foreign_domid, > -- > 1.7.10.4 > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |