|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] xen: use idle vcpus to scrub pages
On 06/17/2014 09:01 PM, Jan Beulich wrote:
>>>> On 17.06.14 at 13:49, <lliubbo@xxxxxxxxx> wrote:
>> In case of heavy lock contention, use a percpu list.
>> - Delist a batch of pages to a percpu list from "scrub" free page list.
>> - Scrub pages on this percpu list.
>> - Add those clean pages to normal "head" free page list, merge with other
>
> Did you mean "heap" here?
>
Yes, sorry!
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -587,12 +587,14 @@ int domain_kill(struct domain *d)
>> d->tmem_client = NULL;
>> /* fallthrough */
>> case DOMDYING_dying:
>> + enable_idle_scrub = 0;
>> rc = domain_relinquish_resources(d);
>> if ( rc != 0 )
>> {
>> BUG_ON(rc != -EAGAIN);
>> break;
>> }
>> + enable_idle_scrub = 1;
>> for_each_vcpu ( d, v )
>> unmap_vcpu_info(v);
>> d->is_dying = DOMDYING_dead;
>
> ???
enable_idle_scrub is a rough way to disable idle vcpu scrubbing, so that
domain_relinquish_resources() can get "&heap_lock" faster and make xl
destroy return quickly.
>
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -79,6 +79,9 @@ PAGE_LIST_HEAD(page_offlined_list);
>> /* Broken page list, protected by heap_lock. */
>> PAGE_LIST_HEAD(page_broken_list);
>>
>> +volatile bool_t enable_idle_scrub;
>> +DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu);
>
> I'm pretty sure I pointed out to you before that variables used only in
> a single file should be static.
>
Sorry, again..
>> @@ -1387,7 +1390,86 @@ void __init scrub_heap_pages(void)
>> setup_low_mem_virq();
>> }
>>
>> +#define SCRUB_BATCH 1024
>> +void scrub_free_pages(void)
>> +{
>> + struct page_info *pg;
>> + unsigned int i, j, node_empty = 0, nr_delisted = 0;
>> + int order;
>> + unsigned int cpu = smp_processor_id();
>> + unsigned int node = cpu_to_node(cpu);
>> + struct page_list_head *temp_list = &this_cpu(scrub_list_cpu);
>> +
>> + if ( !enable_idle_scrub )
>> + return;
>> +
>> + do
>> + {
>> + if ( page_list_empty(temp_list) )
>> + {
>> + /* Delist a batch of pages from global scrub list */
>> + spin_lock(&heap_lock);
>> + for ( j = 0; j < NR_ZONES; j++ )
>> + {
>> + for ( order = MAX_ORDER; order >= 0; order-- )
>> + {
>> + if ( (pg = page_list_remove_head(&scrub(node, j,
>> order))) )
>> + {
>> + for ( i = 0; i < (1 << order); i++)
>> + mark_page_offline(&pg[i], 0);
>
> A page previously having caused #MC now suddenly became healthy
> again?
>
Will be fixed.
>> +
>> + page_list_add_tail(pg, temp_list);
>> + nr_delisted += (1 << order);
>> + if ( nr_delisted > SCRUB_BATCH )
>> + {
>> + nr_delisted = 0;
>> + spin_unlock(&heap_lock);
>> + goto start_scrub;
>> + }
>> + }
>> + }
>> + }
>> +
>> + node_empty = 1;
>> + spin_unlock(&heap_lock);
>> + }
>> + else
>> + {
>> +start_scrub:
>
> Labels need to be indented by at least one space.
>
>> + /* Scrub percpu list */
>> + while ( !page_list_empty(temp_list) )
>> + {
>> + pg = page_list_remove_head(temp_list);
>> + ASSERT(pg);
>> + order = PFN_ORDER(pg);
>> + for ( i = 0; i < (1 << order); i++ )
>> + {
>
> Order here can again be almost arbitrarily high. You aren't allowed
> to scrub e.g. 4Tb in one go.
>
I can add a softirq_pending(cpu) check after each scrub_one_page.
>> + ASSERT( test_bit(_PGC_need_scrub, &(pg[i].count_info))
>> );
>> + scrub_one_page(&pg[i]);
>> + pg[i].count_info &= ~(PGC_need_scrub);
>> + }
>>
>> + /* Add pages to free heap list */
>> + spin_lock(&heap_lock);
>
> Between the dropping of the lock above and the re-acquiring of it
> here the pages "stolen" can lead to allocation failures despite there
> being available memory. This needs to be avoided if at all possible.
>
#define SCRUB_BATCH 1024 is used to limit the number of pages in percpu
list.
But yes, in the worst case(order=20) 4Tb will be invisible. I don't have
a good solution right now.
>> + for ( i = 0; i < (1 << order); i++ )
>> + {
>> + ASSERT ( !test_bit(_PGC_need_scrub,
>> &(pg[i].count_info)) );
>> + pg[i].count_info |= PGC_state_free;
>> + }
>> + ASSERT (node == phys_to_nid(page_to_maddr(pg)));
>> + merge_free_trunks(pg, order, 0);
>> + spin_unlock(&heap_lock);
>> +
>> + if ( softirq_pending(cpu) )
>> + return;
>> + }
>> + }
>> +
>> + /* Scrub list of this node is empty */
>> + if ( node_empty )
>> + return;
>> + } while ( !softirq_pending(cpu) );
>> +}
>
> And all the logic needs to be made NUMA-aware, i.e. a CPU should
> prefer to scrub local memory, and only resort to scrubbing distant
> memory if no CPU on the correct node is idle. Plus (as we have seen
> with Konrad's boot time scrubbing changes) you should avoid having
> two hyperthreads within the same core doing scrubbing at the same
> time.
>
Okay, will be taken into account.
> In the end I'm getting the impression that this all wasn't properly
> thought through yet. Convoys on the lock inside the idle processing
> should e.g. also be avoided (or reduced to a minimum) - there's no
> point preventing the entering of C states if all you're otherwise going
> to do is spin on a lock.
>
I already use percpu list to reduce spin lock. node_empty can be
extended to avoid the case your mentioned.
Thank you very much for all your review.
-Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |