[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote: > From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > > The page scrubbing is done in 128MB chunks in lockstep across all the > non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each > chunk is being scrubbed and then release the heap_lock when the CPU's are > finished scrubing their individual chunk. This allows for the heap_lock to > not be held continously and for pending softirqs are to be serviced > periodically across the CPU's. > > The page scrub memory chunks are allocated to the CPU's in a NUMA aware > fashion to reduce socket interconnect overhead and improve performance. > Specifically in the first phase we scrub at the same time on all the > NUMA nodes that have CPUs - we also weed out the SMT threads so that > we only use cores (that gives a 50% boost). The second phase is for NUMA > nodes that have no CPUs - for that we use the closest NUMA node's CPUs > (non-SMT again) to do the job. > > This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron > 6386 machine from 49 seconds to 3 seconds. > On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes > to 63 seconds. > > Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > Reviewed-by: Tim Deegan <tim@xxxxxxx> Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> A few minor nits... > > --- > > v2 > - Reduced default chunk size to 128MB > - Added code to scrub NUMA nodes with no active CPU linked to them > - Be robust to boot CPU not being linked to a NUMA node > > v3: > - Don't use SMT threads > - Take care of remainder if the number of CPUs (or memory) is odd > - Restructure the worker thread > - s/u64/unsigned long/ > > v4: > - Don't use all CPUs on non-CPU NUMA nodes, just closest one > - Syntax, and docs updates > - Compile on ARM > - Fix bug when NUMA node has 0 pages > > v5: > - Properly figure out best NUMA node. > - Fix comments to be proper style. > > v6: > - Missing page shift on default values, optimize cpumask usage. > - Fix case of NODE having one page and below first_valid_mfn > - Add Ack > --- > docs/misc/xen-command-line.markdown | 10 ++ > xen/common/page_alloc.c | 211 > ++++++++++++++++++++++++++++++++--- > xen/include/asm-arm/numa.h | 1 + > 3 files changed, 204 insertions(+), 18 deletions(-) > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index 25829fe..509f462 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature to > prevent > accidentally leaking sensitive VM data into other VMs if Xen crashes > and reboots. > > +### bootscrub_chunk _ needs escaping with a backslash > +> `= <size>` > + > +> Default: `134217728` Due to the implicit 'k' suffix on sizes, this is not a useful hint as to how to change the default. Furthermore, `128M` is substantially clearer. > + > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap > lock > +and not running softirqs. Reduce this if softirqs are not being run > frequently > +enough. Setting this to a high value may cause boot failure, particularly if > +the NMI watchdog is also enabled. > + > ### cachesize > > `= <size>` > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > ... > - * Scrub all unallocated pages in all heap zones. This function is more > - * convoluted than appears necessary because we do not want to continuously > - * hold the lock while scrubbing very large memory areas. > + * Scrub all unallocated pages in all heap zones. This function uses all > + * online cpu's to scrub the memory in parallel. > */ > void __init scrub_heap_pages(void) > { > - unsigned long mfn; > - struct page_info *pg; > + cpumask_t node_cpus, all_worker_cpus; > + unsigned int i, j; > + unsigned long offset, max_per_cpu_sz = 0; > + unsigned long start, end; > + unsigned long rem = 0; > + int last_distance, best_node; > + int cpus; > > if ( !opt_bootscrub ) > return; > > - printk("Scrubbing Free RAM: "); > + cpumask_clear(&all_worker_cpus); > + /* Scrub block size. */ > + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT; > + if ( chunk_size == 0 ) > + chunk_size = MB(128) >> PAGE_SHIFT; > + > + /* Round #0 - figure out amounts and which CPUs to use. */ > + for_each_online_node ( i ) > + { > + if ( !node_spanned_pages(i) ) > + continue; > + /* Calculate Node memory start and end address. */ > + start = max(node_start_pfn(i), first_valid_mfn); > + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page); > + /* Just in case NODE has 1 page and starts below first_valid_mfn. */ > + end = max(end, start); > + /* CPUs that are online and on this node (if none, that it is OK). */ > + cpus = find_non_smt(i, &node_cpus); > + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus); > + if ( cpus <= 0 ) > + { > + /* No CPUs on this node. Round #2 will take of it. */ > + rem = 0; > + region[i].per_cpu_sz = (end - start); > + } > + else > + { > + rem = (end - start) % cpumask_weight(&node_cpus); > + region[i].per_cpu_sz = (end - start) / > cpumask_weight(&node_cpus); The 'cpus' variable still holds cpumask_weight(&node_cpus), and is rather more efficient to use. > + if ( region[i].per_cpu_sz > max_per_cpu_sz ) > + max_per_cpu_sz = region[i].per_cpu_sz; > + } > + region[i].start = start; > + region[i].rem = rem; > + cpumask_copy(®ion[i].cpus, &node_cpus); > + } > + > + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", > num_online_nodes(), > + cpumask_weight(&all_worker_cpus)); Both of these are signed quantities, rather than unsigned. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |