[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/PV: fix unintended dependency of m2p-strict mode on migration-v2



>>> On 12.01.16 at 12:55, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 12/01/16 10:08, Jan Beulich wrote:
>> This went unnoticed until a backport of this to an older Xen got used,
>> causing migration of guests enabling this VM assist to fail, because
>> page table pinning there preceeds vCPU context loading, and hence L4
>> tables get initialized for the wrong mode. Fix this by post-processing
>> L4 tables when setting the intended VM assist flags for the guest.
>>
>> Note that this leaves in place a dependency on vCPU 0 getting its guest
>> context restored first, but afaict the logic here is not the only thing
>> depending on that.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1067,8 +1067,48 @@ int arch_set_info_guest(
>>          goto out;
>>  
>>      if ( v->vcpu_id == 0 )
>> +    {
>>          d->vm_assist = c(vm_assist);
>>  
>> +        /*
>> +         * In the restore case we need to deal with L4 pages which got
>> +         * initialized with m2p_strict still clear (and which hence lack the
>> +         * correct initial RO_MPT_VIRT_{START,END} L4 entry).
>> +         */
>> +        if ( d != current->domain && VM_ASSIST(d, m2p_strict) &&
>> +             is_pv_domain(d) && !is_pv_32bit_domain(d) &&
>> +             atomic_read(&d->arch.pv_domain.nr_l4_pages) )
>> +        {
>> +            bool_t done = 0;
>> +
>> +            spin_lock_recursive(&d->page_alloc_lock);
>> +
>> +            for ( i = 0; ; )
>> +            {
>> +                struct page_info *page = 
>> page_list_remove_head(&d->page_list);
>> +
>> +                if ( page_lock(page) )
>> +                {
>> +                    if ( (page->u.inuse.type_info & PGT_type_mask) ==
>> +                         PGT_l4_page_table )
>> +                        done = !fill_ro_mpt(page_to_mfn(page));
>> +
>> +                    page_unlock(page);
>> +                }
>> +
>> +                page_list_add_tail(page, &d->page_list);
>> +
>> +                if ( done || (!(++i & 0xff) && hypercall_preempt_check()) )
>> +                    break;
>> +            }
>> +
>> +            spin_unlock_recursive(&d->page_alloc_lock);
>> +
>> +            if ( !done )
>> +                return -ERESTART;
> 
> This is a long loop.  It is preemptible, but will incur a time delay
> proportional to the size of the domain during the VM downtime. 
> 
> Could you defer the loop until after %cr3 has set been set up, and only
> enter the loop if the kernel l4 table is missing the RO mappings?  That
> way, domains migrated with migration v2 will skip the loop entirely.

Well, first of all this would be the result only as long as you or
someone else don't re-think and possibly move pinning ahead of
context load again.

Deferring until after CR3 got set up is - afaict - not an option, as
it would defeat the purpose of m2p-strict mode as much as doing
the fixup e.g. in the #PF handler. This mode enabled needs to
strictly mean "L4s start with the slot filled, and user-mode uses
clear it", as documented.

There's a much simpler way we could avoid the loop being
entered: Check the previous setting of the flag. However, I
intentionally did not go that route in this initial version as I
didn't want to add more special casing than needed, plus to
make sure the new code isn't effectively dead.

>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1463,13 +1463,20 @@ void init_guest_l4_table(l4_pgentry_t l4
>>          l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
>>  }
>>  
>> -void fill_ro_mpt(unsigned long mfn)
>> +bool_t fill_ro_mpt(unsigned long mfn)
>>  {
>>      l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
>> +    bool_t ret = 0;
>>  
>> -    l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
>> -        idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
>> +    if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) )
>> +    {
>> +        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
>> +            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
>> +        ret = 1;
> 
> This is a behavioural change.  Previously, the old value was clobbered.
> 
> It appears that you are now using this return value to indicate when the
> entire pagelist has been walked, but it it relies on the slots being
> zero, which is a fragile assumption.

There are only two values possible in this slot - zero or the one
referring to the _shared across domains_ sub-tree for the r/o
MPT. I.e. the change of behavior is only an apparent one, and
I don't see this being fragile either.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.