[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen: use idle vcpus to scrub pages
On 06/17/2014 09:01 PM, Jan Beulich wrote: >>>> On 17.06.14 at 13:49, <lliubbo@xxxxxxxxx> wrote: >> In case of heavy lock contention, use a percpu list. >> - Delist a batch of pages to a percpu list from "scrub" free page list. >> - Scrub pages on this percpu list. >> - Add those clean pages to normal "head" free page list, merge with other > > Did you mean "heap" here? > Yes, sorry! >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -587,12 +587,14 @@ int domain_kill(struct domain *d) >> d->tmem_client = NULL; >> /* fallthrough */ >> case DOMDYING_dying: >> + enable_idle_scrub = 0; >> rc = domain_relinquish_resources(d); >> if ( rc != 0 ) >> { >> BUG_ON(rc != -EAGAIN); >> break; >> } >> + enable_idle_scrub = 1; >> for_each_vcpu ( d, v ) >> unmap_vcpu_info(v); >> d->is_dying = DOMDYING_dead; > > ??? enable_idle_scrub is a rough way to disable idle vcpu scrubbing, so that domain_relinquish_resources() can get "&heap_lock" faster and make xl destroy return quickly. > >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -79,6 +79,9 @@ PAGE_LIST_HEAD(page_offlined_list); >> /* Broken page list, protected by heap_lock. */ >> PAGE_LIST_HEAD(page_broken_list); >> >> +volatile bool_t enable_idle_scrub; >> +DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu); > > I'm pretty sure I pointed out to you before that variables used only in > a single file should be static. > Sorry, again.. >> @@ -1387,7 +1390,86 @@ void __init scrub_heap_pages(void) >> setup_low_mem_virq(); >> } >> >> +#define SCRUB_BATCH 1024 >> +void scrub_free_pages(void) >> +{ >> + struct page_info *pg; >> + unsigned int i, j, node_empty = 0, nr_delisted = 0; >> + int order; >> + unsigned int cpu = smp_processor_id(); >> + unsigned int node = cpu_to_node(cpu); >> + struct page_list_head *temp_list = &this_cpu(scrub_list_cpu); >> + >> + if ( !enable_idle_scrub ) >> + return; >> + >> + do >> + { >> + if ( page_list_empty(temp_list) ) >> + { >> + /* Delist a batch of pages from global scrub list */ >> + spin_lock(&heap_lock); >> + for ( j = 0; j < NR_ZONES; j++ ) >> + { >> + for ( order = MAX_ORDER; order >= 0; order-- ) >> + { >> + if ( (pg = page_list_remove_head(&scrub(node, j, >> order))) ) >> + { >> + for ( i = 0; i < (1 << order); i++) >> + mark_page_offline(&pg[i], 0); > > A page previously having caused #MC now suddenly became healthy > again? > Will be fixed. >> + >> + page_list_add_tail(pg, temp_list); >> + nr_delisted += (1 << order); >> + if ( nr_delisted > SCRUB_BATCH ) >> + { >> + nr_delisted = 0; >> + spin_unlock(&heap_lock); >> + goto start_scrub; >> + } >> + } >> + } >> + } >> + >> + node_empty = 1; >> + spin_unlock(&heap_lock); >> + } >> + else >> + { >> +start_scrub: > > Labels need to be indented by at least one space. > >> + /* Scrub percpu list */ >> + while ( !page_list_empty(temp_list) ) >> + { >> + pg = page_list_remove_head(temp_list); >> + ASSERT(pg); >> + order = PFN_ORDER(pg); >> + for ( i = 0; i < (1 << order); i++ ) >> + { > > Order here can again be almost arbitrarily high. You aren't allowed > to scrub e.g. 4Tb in one go. > I can add a softirq_pending(cpu) check after each scrub_one_page. >> + ASSERT( test_bit(_PGC_need_scrub, &(pg[i].count_info)) >> ); >> + scrub_one_page(&pg[i]); >> + pg[i].count_info &= ~(PGC_need_scrub); >> + } >> >> + /* Add pages to free heap list */ >> + spin_lock(&heap_lock); > > Between the dropping of the lock above and the re-acquiring of it > here the pages "stolen" can lead to allocation failures despite there > being available memory. This needs to be avoided if at all possible. > #define SCRUB_BATCH 1024 is used to limit the number of pages in percpu list. But yes, in the worst case(order=20) 4Tb will be invisible. I don't have a good solution right now. >> + for ( i = 0; i < (1 << order); i++ ) >> + { >> + ASSERT ( !test_bit(_PGC_need_scrub, >> &(pg[i].count_info)) ); >> + pg[i].count_info |= PGC_state_free; >> + } >> + ASSERT (node == phys_to_nid(page_to_maddr(pg))); >> + merge_free_trunks(pg, order, 0); >> + spin_unlock(&heap_lock); >> + >> + if ( softirq_pending(cpu) ) >> + return; >> + } >> + } >> + >> + /* Scrub list of this node is empty */ >> + if ( node_empty ) >> + return; >> + } while ( !softirq_pending(cpu) ); >> +} > > And all the logic needs to be made NUMA-aware, i.e. a CPU should > prefer to scrub local memory, and only resort to scrubbing distant > memory if no CPU on the correct node is idle. Plus (as we have seen > with Konrad's boot time scrubbing changes) you should avoid having > two hyperthreads within the same core doing scrubbing at the same > time. > Okay, will be taken into account. > In the end I'm getting the impression that this all wasn't properly > thought through yet. Convoys on the lock inside the idle processing > should e.g. also be avoided (or reduced to a minimum) - there's no > point preventing the entering of C states if all you're otherwise going > to do is spin on a lock. > I already use percpu list to reduce spin lock. node_empty can be extended to avoid the case your mentioned. Thank you very much for all your review. -Bob _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |