[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/PV: fix unintended dependency of m2p-strict mode on migration-v2



Ping? (I'd really like to get this resolved, so we don't need to
indefinitely run with non-upstream behavior in our distros.)

Thanks, Jan

>>> On 13.01.16 at 17:15, <JBeulich@xxxxxxxx> wrote:
>>>> On 13.01.16 at 17:00, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 13/01/16 15:36, Jan Beulich wrote:
>>>>>> On 13.01.16 at 16:25, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 12/01/16 15:19, Jan Beulich wrote:
>>>>>>>> On 12.01.16 at 12:55, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>> On 12/01/16 10:08, Jan Beulich wrote:
>>>>>>> This went unnoticed until a backport of this to an older Xen got used,
>>>>>>> causing migration of guests enabling this VM assist to fail, because
>>>>>>> page table pinning there preceeds vCPU context loading, and hence L4
>>>>>>> tables get initialized for the wrong mode. Fix this by post-processing
>>>>>>> L4 tables when setting the intended VM assist flags for the guest.
>>>>>>>
>>>>>>> Note that this leaves in place a dependency on vCPU 0 getting its guest
>>>>>>> context restored first, but afaict the logic here is not the only thing
>>>>>>> depending on that.
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>
>>>>>>> --- a/xen/arch/x86/domain.c
>>>>>>> +++ b/xen/arch/x86/domain.c
>>>>>>> @@ -1067,8 +1067,48 @@ int arch_set_info_guest(
>>>>>>>          goto out;
>>>>>>>  
>>>>>>>      if ( v->vcpu_id == 0 )
>>>>>>> +    {
>>>>>>>          d->vm_assist = c(vm_assist);
>>>>>>>  
>>>>>>> +        /*
>>>>>>> +         * In the restore case we need to deal with L4 pages which got
>>>>>>> +         * initialized with m2p_strict still clear (and which hence 
>>>>>>> lack 
>>>> the
>>>>>>> +         * correct initial RO_MPT_VIRT_{START,END} L4 entry).
>>>>>>> +         */
>>>>>>> +        if ( d != current->domain && VM_ASSIST(d, m2p_strict) &&
>>>>>>> +             is_pv_domain(d) && !is_pv_32bit_domain(d) &&
>>>>>>> +             atomic_read(&d->arch.pv_domain.nr_l4_pages) )
>>>>>>> +        {
>>>>>>> +            bool_t done = 0;
>>>>>>> +
>>>>>>> +            spin_lock_recursive(&d->page_alloc_lock);
>>>>>>> +
>>>>>>> +            for ( i = 0; ; )
>>>>>>> +            {
>>>>>>> +                struct page_info *page = 
>>>>>>> page_list_remove_head(&d->page_list);
>>>>>>> +
>>>>>>> +                if ( page_lock(page) )
>>>>>>> +                {
>>>>>>> +                    if ( (page->u.inuse.type_info & PGT_type_mask) ==
>>>>>>> +                         PGT_l4_page_table )
>>>>>>> +                        done = !fill_ro_mpt(page_to_mfn(page));
>>>>>>> +
>>>>>>> +                    page_unlock(page);
>>>>>>> +                }
>>>>>>> +
>>>>>>> +                page_list_add_tail(page, &d->page_list);
>>>>>>> +
>>>>>>> +                if ( done || (!(++i & 0xff) && 
>>>>>>> hypercall_preempt_check()) )
>>>>>>> +                    break;
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            spin_unlock_recursive(&d->page_alloc_lock);
>>>>>>> +
>>>>>>> +            if ( !done )
>>>>>>> +                return -ERESTART;
>>>>>> This is a long loop.  It is preemptible, but will incur a time delay
>>>>>> proportional to the size of the domain during the VM downtime. 
>>>>>>
>>>>>> Could you defer the loop until after %cr3 has set been set up, and only
>>>>>> enter the loop if the kernel l4 table is missing the RO mappings?  That
>>>>>> way, domains migrated with migration v2 will skip the loop entirely.
>>>>> Well, first of all this would be the result only as long as you or
>>>>> someone else don't re-think and possibly move pinning ahead of
>>>>> context load again.
>>>> A second set_context() will unconditionally hit the loop though.
>>> Right - another argument against making any change to what is
>>> in the patch right now.
>> 
>> If there are any L4 pages, the current code will unconditionally search
>> the pagelist on every entry to the function, even when it has already
>> fixed up the strictness.
>> 
>> A toolstack can enter this functions multiple times for the same vcpu,
>> by resetting the vcpu state inbetween.  How much do we care about this
>> usage?
> 
> If we cared at all, we'd need to insert another similar piece of
> code in the reset path (moving L4s back to m2p-relaxed mode).
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx 
> http://lists.xen.org/xen-devel 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.