Xen project Mailing List

Re: [Xen-devel] [PATCH v2 3/3] xen: use idle vcpus to scrub pages

>>> On 30.06.14 at 15:39, <lliubbo@xxxxxxxxx> wrote: > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -116,6 +116,7 @@ static void idle_loop(void) > { > if ( cpu_is_offline(smp_processor_id()) ) > play_dead(); > + scrub_free_pages(); > (*pm_idle)(); So is it really appropriate to call pm_idle() if scrub_free_pages() didn't return immediately? I.e. I would suppose the function ought to return a boolean indicator whether any meaningful amount of processing was done, in which case we may want to skip entering any kind of C state (even if it would end up being just C1), or doing any of the processing associated with this possible entering. > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index ab293c8..6ab1d1d 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -86,6 +86,12 @@ PAGE_LIST_HEAD(page_offlined_list); > /* Broken page list, protected by heap_lock. */ > PAGE_LIST_HEAD(page_broken_list); > > +/* A rough flag to indicate whether a node have need_scrub pages */ > +static bool_t node_need_scrub[MAX_NUMNODES]; > +static DEFINE_PER_CPU(bool_t, is_scrubbing); > +static DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu); > +static DEFINE_PER_CPU(struct page_list_head, free_list_cpu); > + > /************************* > * BOOT-TIME ALLOCATOR > */ > @@ -948,6 +954,7 @@ static void free_heap_pages( > { > if ( !tainted ) > { > + node_need_scrub[node] = 1; > for ( i = 0; i < (1 << order); i++ ) > pg[i].count_info |= PGC_need_scrub; > } Iirc it was more than this single place where you set PGC_need_scrub, and hence where you'd now need to set the other flag too. > @@ -1525,7 +1532,130 @@ void __init scrub_heap_pages(void) > setup_low_mem_virq(); > } > > +#define SCRUB_BATCH_ORDER 12 Please make this adjustable via command line, so that eventual latency issues can be worked around. > +static void __scrub_free_pages(unsigned int node, unsigned int cpu) > +{ > + struct page_info *pg, *tmp; > + unsigned int i; > + int order; > + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu); > + struct page_list_head *local_free_list = &this_cpu(free_list_cpu); > + > + /* Scrub percpu list */ > + while ( !page_list_empty(local_scrub_list) ) > + { > + pg = page_list_remove_head(local_scrub_list); > + order = PFN_ORDER(pg); > + ASSERT( pg && order <= SCRUB_BATCH_ORDER ); > + for ( i = 0; i < (1 << order); i++ ) > + { > + ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) ); > + scrub_one_page(&pg[i]); > + } > + page_list_add_tail(pg, local_free_list); > + if ( softirq_pending(cpu) ) > + return; Hard tabs. Please, also with further violations below in mind, check your code before submitting. > + } > + > + /* free percpu free list */ > + if ( !page_list_empty(local_free_list) ) > + { > + spin_lock(&heap_lock); > + page_list_for_each_safe( pg, tmp, local_free_list ) > + { > + order = PFN_ORDER(pg); > + page_list_del(pg, local_free_list); > + for ( i = 0; i < (1 << order); i++ ) > + { > + pg[i].count_info |= PGC_state_free; > + pg[i].count_info &= ~PGC_need_scrub; This needs to happen earlier - the scrub flag should be cleared right after scrubbing, and the free flag should imo be set when the page gets freed. That's for two reasons: 1) Hypervisor allocations don't need scrubbed pages, i.e. they can allocate memory regardless of the scrub flag's state. 2) You still detain the memory on the local lists from allocation. On a many-node system, the 16Mb per node can certainly sum up (which is not to say that I don't view the 16Mb on a single node as already problematic). > + } > + merge_free_trunks(pg, order, node, page_to_zone(pg), 0); > + } > + spin_unlock(&heap_lock); > + } > +} > + > +void scrub_free_pages(void) > +{ > + int order; > + struct page_info *pg, *tmp; > + unsigned int i, zone, nr_delisted = 0; > + unsigned int cpu = smp_processor_id(); > + unsigned int node = cpu_to_node(cpu); > + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu); > + > + /* Return if our sibling already started scrubbing */ > + for_each_cpu( i, per_cpu(cpu_sibling_mask,cpu) ) > + if ( per_cpu(is_scrubbing, i) ) > + return; > + this_cpu(is_scrubbing) = 1; If you really want to enforce exclusiveness, you need a single canonical flag per core (could e.g. be per_cpu(is_scrubbing, cpumask_first(per_cpu(cpu_sibling_mask, cpu))), set via test_and_set_bool()). > + > + while ( !softirq_pending(cpu) ) > + { > + if ( !node_need_scrub[node] ) > + { > + /* Free local per cpu list before we exit */ > + __scrub_free_pages(node, cpu); > + goto out; > + } This seems unnecessary: Just have the if() below be if ( node_need_scrub[node] && page_list_empty(local_scrub_list) ) along with __scrub_free_pages() returning whether it exited because of softirq_pending(cpu) (which at once eliminates the need for the check at the loop header above, allowing this to become a nice do { ... } while ( !__scrub_free_pages() ). > + > + /* Delist a batch of pages from global scrub list */ > + if ( page_list_empty(local_scrub_list) ) > + { > + spin_lock(&heap_lock); > + for ( zone = 0; zone < NR_ZONES; zone++ ) > + { > + for ( order = MAX_ORDER; order >= 0; order-- ) > + { > + page_list_for_each_safe( pg, tmp, &heap(node, zone, > order) ) > + { > + if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) ) Please avoid the inner parentheses here. > + continue; > + > + page_list_del( pg, &heap(node, zone, order) ); > + if ( order > SCRUB_BATCH_ORDER) Coding style. > + { > + /* putback extra pages */ > + i = order; > + while ( i != SCRUB_BATCH_ORDER ) This would better be a do/while construct - on the first iteration you already know the condition is true. > + { > + PFN_ORDER(pg) = --i; > + page_list_add_tail(pg, &heap(node, zone, i)); > + pg += 1 << i; I realize there are cases where this is also not being done correctly in existing code, but please use 1UL here and in all similar cases. > + } > + PFN_ORDER(pg) = SCRUB_BATCH_ORDER; > + } > + > + for ( i = 0; i < (1 << PFN_ORDER(pg)); i++ ) Can't you just use "order" here (and a few lines down)? > + { > + ASSERT( test_bit(_PGC_need_scrub, > &pg[i].count_info) ); > + ASSERT( !test_bit(_PGC_broken, > &pg[i].count_info) ); > + mark_page_offline(&pg[i], 0); mark_page_offline() ??? Jan > + } > + page_list_add_tail(pg, local_scrub_list); > + nr_delisted += ( 1 << PFN_ORDER(pg) ); > + if ( nr_delisted >= (1 << SCRUB_BATCH_ORDER) ) > + { > + nr_delisted = 0; > + spin_unlock(&heap_lock); > + goto start_scrub; > + } > + } > + } > + } > + > + node_need_scrub[node] = 0; > + spin_unlock(&heap_lock); > + } > > + start_scrub: > + __scrub_free_pages(node, cpu); > + } > + > + out: > + this_cpu(is_scrubbing) = 0; > +} _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.