Xen project Mailing List

Re: [Xen-devel] [PATCH v4 4/8] mm: Scrub memory from idle loop

>>> On 19.05.17 at 17:50, <boris.ostrovsky@xxxxxxxxxx> wrote: > Instead of scrubbing pages during guest destruction (from > free_heap_pages()) do this opportunistically, from the idle loop. This is too brief for my taste. In particular the re-ordering ... > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -118,8 +118,9 @@ static void idle_loop(void) > { > if ( cpu_is_offline(smp_processor_id()) ) > play_dead(); > - (*pm_idle)(); > do_tasklet(); > + if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() ) > + (*pm_idle)(); > do_softirq(); ... here (and its correctness / safety) needs an explanation. Not processing tasklets right after idle wakeup is a not obviously correct change. Nor is it immediately clear why this needs to be pulled ahead for your purposes. > --- a/xen/arch/x86/domain_page.c > +++ b/xen/arch/x86/domain_page.c > @@ -18,12 +18,14 @@ > #include <asm/hardirq.h> > #include <asm/setup.h> > > -static struct vcpu *__read_mostly override; > +static DEFINE_PER_CPU(struct vcpu *, override); > > static inline struct vcpu *mapcache_current_vcpu(void) > { > + struct vcpu *v, *this_vcpu = this_cpu(override); > + > /* In the common case we use the mapcache of the running VCPU. */ > - struct vcpu *v = override ?: current; > + v = this_vcpu ?: current; What's wrong with struct vcpu *v = this_cpu(override) ?: current; ? And this, btw, is another change which should have an explanation in the commit message. > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1010,15 +1010,79 @@ static int reserve_offlined_page(struct page_info > *head) > return count; > } > > -static void scrub_free_pages(unsigned int node) > +static nodemask_t node_scrubbing; > + > +/* > + * If get_node is true this will return closest node that needs to be > scrubbed, > + * with appropriate bit in node_scrubbing set. > + * If get_node is not set, this will return *a* node that needs to be > scrubbed. > + * node_scrubbing bitmask will no be updated. > + * If no node needs scrubbing then NUMA_NO_NODE is returned. > + */ > +static unsigned int node_to_scrub(bool get_node) > { > - struct page_info *pg; > - unsigned int zone; > + nodeid_t node = cpu_to_node(smp_processor_id()), local_node; > + nodeid_t closest = NUMA_NO_NODE; > + u8 dist, shortest = 0xff; > > - ASSERT(spin_is_locked(&heap_lock)); > + if ( node == NUMA_NO_NODE ) > + node = 0; > > - if ( !node_need_scrub[node] ) > - return; > + if ( node_need_scrub[node] && > + (!get_node || !node_test_and_set(node, node_scrubbing)) ) > + return node; > + > + /* > + * See if there are memory-only nodes that need scrubbing and choose > + * the closest one. > + */ > + local_node = node; > + while ( 1 ) As some compilers / compiler versions warn about such constructs ("condition is always true"), I generally recommend using "for ( ; ; )" instead. > + { > + do { > + node = cycle_node(node, node_online_map); > + } while ( !cpumask_empty(&node_to_cpumask(node)) && > + (node != local_node) ); > + > + if ( node == local_node ) > + break; > + > + if ( node_need_scrub[node] ) > + { > + if ( !get_node ) > + return node; > + > + dist = __node_distance(local_node, node); > + if ( dist < shortest || closest == NUMA_NO_NODE ) > + { > + if ( !node_test_and_set(node, node_scrubbing) ) > + { > + if ( closest != NUMA_NO_NODE ) > + node_clear(closest, node_scrubbing); You call this function with no locks held, yet you temporarily set a bit in node_scrubbing which you then may clear again here. That'll prevent another idle CPU to do scrubbing on this node afaict, which, while not a bug, is not optimal. Therefore I think for this second part of the function you actually want to acquire the heap lock. > + shortest = dist; > + closest = node; > + } > + } Also note that two if()s like the above, when the inner one also has no else, can and should be combined into one. > @@ -1035,22 +1099,46 @@ static void scrub_free_pages(unsigned int node) > > for ( i = pg->u.free.first_dirty; i < (1U << order); i++) > { > + cnt++; > if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) > { > scrub_one_page(&pg[i]); > pg[i].count_info &= ~PGC_need_scrub; > node_need_scrub[node]--; > + cnt += 100; /* scrubbed pages add heavier weight. */ > } While it doesn't matter much, I would guess that your intention really was to either do the "cnt++" in an "else" to this "if()", or use 99 instead of 100 above. > - } > > - page_list_del(pg, &heap(node, zone, order)); > - page_list_add_scrub(pg, node, zone, order, > INVALID_DIRTY_IDX); > + /* > + * Scrub a few (8) pages before becoming eligible for > + * preemtion. But also count non-scrubbing loop iteration "preemption" and "iterations" > + * so that we don't get stuck here with an almost clean > + * heap. > + */ > + if ( softirq_pending(cpu) && cnt > 800 ) Please switch both sides of the &&. > + { > + preempt = true; > + break; > + } > + } > > - if ( node_need_scrub[node] == 0 ) > - return; > + if ( i == (1U << order) ) > + { > + page_list_del(pg, &heap(node, zone, order)); > + page_list_add_scrub(pg, node, zone, order, > INVALID_DIRTY_IDX); > + } > + else > + pg->u.free.first_dirty = i + 1; This seems wrong if you set "preempt" after scrubbing the last page of a buddy - in that case the increment of i in the loop header is being skipped yet the entire buddy is now clean, so the if() condition here wants to be "i >= (1U << order) - 1" afaict. In any event it needs to be impossible for first_dirty to obtain a value of 1U << order here. > + if ( preempt || (node_need_scrub[node] == 0) ) > + goto out; > } > } while ( order-- != 0 ); > } > + > + out: > + spin_unlock(&heap_lock); > + node_clear(node, node_scrubbing); With the memory-less node comment above it may be necessary to switch these two around. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.