|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 4/8] mm: Scrub memory from idle loop
>>> On 19.05.17 at 17:50, <boris.ostrovsky@xxxxxxxxxx> wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.
This is too brief for my taste. In particular the re-ordering ...
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -118,8 +118,9 @@ static void idle_loop(void)
> {
> if ( cpu_is_offline(smp_processor_id()) )
> play_dead();
> - (*pm_idle)();
> do_tasklet();
> + if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
> + (*pm_idle)();
> do_softirq();
... here (and its correctness / safety) needs an explanation. Not
processing tasklets right after idle wakeup is a not obviously
correct change. Nor is it immediately clear why this needs to be
pulled ahead for your purposes.
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -18,12 +18,14 @@
> #include <asm/hardirq.h>
> #include <asm/setup.h>
>
> -static struct vcpu *__read_mostly override;
> +static DEFINE_PER_CPU(struct vcpu *, override);
>
> static inline struct vcpu *mapcache_current_vcpu(void)
> {
> + struct vcpu *v, *this_vcpu = this_cpu(override);
> +
> /* In the common case we use the mapcache of the running VCPU. */
> - struct vcpu *v = override ?: current;
> + v = this_vcpu ?: current;
What's wrong with
struct vcpu *v = this_cpu(override) ?: current;
? And this, btw, is another change which should have an explanation
in the commit message.
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1010,15 +1010,79 @@ static int reserve_offlined_page(struct page_info
> *head)
> return count;
> }
>
> -static void scrub_free_pages(unsigned int node)
> +static nodemask_t node_scrubbing;
> +
> +/*
> + * If get_node is true this will return closest node that needs to be
> scrubbed,
> + * with appropriate bit in node_scrubbing set.
> + * If get_node is not set, this will return *a* node that needs to be
> scrubbed.
> + * node_scrubbing bitmask will no be updated.
> + * If no node needs scrubbing then NUMA_NO_NODE is returned.
> + */
> +static unsigned int node_to_scrub(bool get_node)
> {
> - struct page_info *pg;
> - unsigned int zone;
> + nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
> + nodeid_t closest = NUMA_NO_NODE;
> + u8 dist, shortest = 0xff;
>
> - ASSERT(spin_is_locked(&heap_lock));
> + if ( node == NUMA_NO_NODE )
> + node = 0;
>
> - if ( !node_need_scrub[node] )
> - return;
> + if ( node_need_scrub[node] &&
> + (!get_node || !node_test_and_set(node, node_scrubbing)) )
> + return node;
> +
> + /*
> + * See if there are memory-only nodes that need scrubbing and choose
> + * the closest one.
> + */
> + local_node = node;
> + while ( 1 )
As some compilers / compiler versions warn about such constructs
("condition is always true"), I generally recommend using "for ( ; ; )"
instead.
> + {
> + do {
> + node = cycle_node(node, node_online_map);
> + } while ( !cpumask_empty(&node_to_cpumask(node)) &&
> + (node != local_node) );
> +
> + if ( node == local_node )
> + break;
> +
> + if ( node_need_scrub[node] )
> + {
> + if ( !get_node )
> + return node;
> +
> + dist = __node_distance(local_node, node);
> + if ( dist < shortest || closest == NUMA_NO_NODE )
> + {
> + if ( !node_test_and_set(node, node_scrubbing) )
> + {
> + if ( closest != NUMA_NO_NODE )
> + node_clear(closest, node_scrubbing);
You call this function with no locks held, yet you temporarily set a
bit in node_scrubbing which you then may clear again here. That'll
prevent another idle CPU to do scrubbing on this node afaict,
which, while not a bug, is not optimal. Therefore I think for this
second part of the function you actually want to acquire the heap
lock.
> + shortest = dist;
> + closest = node;
> + }
> + }
Also note that two if()s like the above, when the inner one also
has no else, can and should be combined into one.
> @@ -1035,22 +1099,46 @@ static void scrub_free_pages(unsigned int node)
>
> for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
> {
> + cnt++;
> if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> {
> scrub_one_page(&pg[i]);
> pg[i].count_info &= ~PGC_need_scrub;
> node_need_scrub[node]--;
> + cnt += 100; /* scrubbed pages add heavier weight. */
> }
While it doesn't matter much, I would guess that your intention
really was to either do the "cnt++" in an "else" to this "if()", or
use 99 instead of 100 above.
> - }
>
> - page_list_del(pg, &heap(node, zone, order));
> - page_list_add_scrub(pg, node, zone, order,
> INVALID_DIRTY_IDX);
> + /*
> + * Scrub a few (8) pages before becoming eligible for
> + * preemtion. But also count non-scrubbing loop iteration
"preemption" and "iterations"
> + * so that we don't get stuck here with an almost clean
> + * heap.
> + */
> + if ( softirq_pending(cpu) && cnt > 800 )
Please switch both sides of the &&.
> + {
> + preempt = true;
> + break;
> + }
> + }
>
> - if ( node_need_scrub[node] == 0 )
> - return;
> + if ( i == (1U << order) )
> + {
> + page_list_del(pg, &heap(node, zone, order));
> + page_list_add_scrub(pg, node, zone, order,
> INVALID_DIRTY_IDX);
> + }
> + else
> + pg->u.free.first_dirty = i + 1;
This seems wrong if you set "preempt" after scrubbing the last page
of a buddy - in that case the increment of i in the loop header is being
skipped yet the entire buddy is now clean, so the if() condition here
wants to be "i >= (1U << order) - 1" afaict. In any event it needs to
be impossible for first_dirty to obtain a value of 1U << order here.
> + if ( preempt || (node_need_scrub[node] == 0) )
> + goto out;
> }
> } while ( order-- != 0 );
> }
> +
> + out:
> + spin_unlock(&heap_lock);
> + node_clear(node, node_scrubbing);
With the memory-less node comment above it may be necessary
to switch these two around.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |