[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen: use idle vcpus to scrub pages
>>> On 17.06.14 at 13:49, <lliubbo@xxxxxxxxx> wrote: > In case of heavy lock contention, use a percpu list. > - Delist a batch of pages to a percpu list from "scrub" free page list. > - Scrub pages on this percpu list. > - Add those clean pages to normal "head" free page list, merge with other Did you mean "heap" here? > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -587,12 +587,14 @@ int domain_kill(struct domain *d) > d->tmem_client = NULL; > /* fallthrough */ > case DOMDYING_dying: > + enable_idle_scrub = 0; > rc = domain_relinquish_resources(d); > if ( rc != 0 ) > { > BUG_ON(rc != -EAGAIN); > break; > } > + enable_idle_scrub = 1; > for_each_vcpu ( d, v ) > unmap_vcpu_info(v); > d->is_dying = DOMDYING_dead; ??? > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -79,6 +79,9 @@ PAGE_LIST_HEAD(page_offlined_list); > /* Broken page list, protected by heap_lock. */ > PAGE_LIST_HEAD(page_broken_list); > > +volatile bool_t enable_idle_scrub; > +DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu); I'm pretty sure I pointed out to you before that variables used only in a single file should be static. > @@ -1387,7 +1390,86 @@ void __init scrub_heap_pages(void) > setup_low_mem_virq(); > } > > +#define SCRUB_BATCH 1024 > +void scrub_free_pages(void) > +{ > + struct page_info *pg; > + unsigned int i, j, node_empty = 0, nr_delisted = 0; > + int order; > + unsigned int cpu = smp_processor_id(); > + unsigned int node = cpu_to_node(cpu); > + struct page_list_head *temp_list = &this_cpu(scrub_list_cpu); > + > + if ( !enable_idle_scrub ) > + return; > + > + do > + { > + if ( page_list_empty(temp_list) ) > + { > + /* Delist a batch of pages from global scrub list */ > + spin_lock(&heap_lock); > + for ( j = 0; j < NR_ZONES; j++ ) > + { > + for ( order = MAX_ORDER; order >= 0; order-- ) > + { > + if ( (pg = page_list_remove_head(&scrub(node, j, > order))) ) > + { > + for ( i = 0; i < (1 << order); i++) > + mark_page_offline(&pg[i], 0); A page previously having caused #MC now suddenly became healthy again? > + > + page_list_add_tail(pg, temp_list); > + nr_delisted += (1 << order); > + if ( nr_delisted > SCRUB_BATCH ) > + { > + nr_delisted = 0; > + spin_unlock(&heap_lock); > + goto start_scrub; > + } > + } > + } > + } > + > + node_empty = 1; > + spin_unlock(&heap_lock); > + } > + else > + { > +start_scrub: Labels need to be indented by at least one space. > + /* Scrub percpu list */ > + while ( !page_list_empty(temp_list) ) > + { > + pg = page_list_remove_head(temp_list); > + ASSERT(pg); > + order = PFN_ORDER(pg); > + for ( i = 0; i < (1 << order); i++ ) > + { Order here can again be almost arbitrarily high. You aren't allowed to scrub e.g. 4Tb in one go. > + ASSERT( test_bit(_PGC_need_scrub, &(pg[i].count_info)) ); > + scrub_one_page(&pg[i]); > + pg[i].count_info &= ~(PGC_need_scrub); > + } > > + /* Add pages to free heap list */ > + spin_lock(&heap_lock); Between the dropping of the lock above and the re-acquiring of it here the pages "stolen" can lead to allocation failures despite there being available memory. This needs to be avoided if at all possible. > + for ( i = 0; i < (1 << order); i++ ) > + { > + ASSERT ( !test_bit(_PGC_need_scrub, &(pg[i].count_info)) > ); > + pg[i].count_info |= PGC_state_free; > + } > + ASSERT (node == phys_to_nid(page_to_maddr(pg))); > + merge_free_trunks(pg, order, 0); > + spin_unlock(&heap_lock); > + > + if ( softirq_pending(cpu) ) > + return; > + } > + } > + > + /* Scrub list of this node is empty */ > + if ( node_empty ) > + return; > + } while ( !softirq_pending(cpu) ); > +} And all the logic needs to be made NUMA-aware, i.e. a CPU should prefer to scrub local memory, and only resort to scrubbing distant memory if no CPU on the correct node is idle. Plus (as we have seen with Konrad's boot time scrubbing changes) you should avoid having two hyperthreads within the same core doing scrubbing at the same time. In the end I'm getting the impression that this all wasn't properly thought through yet. Convoys on the lock inside the idle processing should e.g. also be avoided (or reduced to a minimum) - there's no point preventing the entering of C states if all you're otherwise going to do is spin on a lock. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |