[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate



On 12/09/16 12:17, Jan Beulich wrote:
>>>> On 12.09.16 at 11:51, <andrew.cooper3@xxxxxxxxxx> wrote:
>> A toolstack must call XEN_DOMCTL_getvcpuextstate twice; first to find the 
>> size
>> of the buffer to use, and a second time to get the actual content.
>>
>> The reported size was based on v->arch.xcr0_accum, but a guest which extends
>> its xcr0_accum between the two hypercalls will cause the toolstack to fail 
>> the
>> evc->size != size check, as the provided buffer is now too small.  This 
>> causes
>> a hard error during the final phase of migration.
>>
>> Instead, return return a size based on xfeature_mask, which is the maximum
>> size Xen will ever permit.  The hypercall must now tolerate a
>> toolstack-provided buffer which is overly large (for the case where a guest
>> isn't using all available xsave states), and should write back how much data
>> was actually written into the buffer.
> To be honest, I'm of two minds here. Part of me thinks this is an
> okay change. However, in particular ...
>
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -1054,19 +1054,25 @@ long arch_do_domctl(
>>              unsigned int size;
>>  
>>              ret = 0;
>> -            vcpu_pause(v);
>>  
>> -            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>              if ( (!evc->size && !evc->xfeature_mask) ||
>>                   guest_handle_is_null(evc->buffer) )
>>              {
>> +                /*
>> +                 * A query for the size of buffer to use.  Must return the
>> +                 * maximum size we ever might hand back to userspace, 
>> bearing
>> +                 * in mind that the vcpu might increase its xcr0_accum 
>> between
>> +                 * this query for size, and the following query for data.
>> +                 */
>>                  evc->xfeature_mask = xfeature_mask;
>> -                evc->size = size;
>> -                vcpu_unpause(v);
>> +                evc->size = PV_XSAVE_SIZE(xfeature_mask);
>>                  goto vcpuextstate_out;
>>              }
>>  
>> -            if ( evc->size != size || evc->xfeature_mask != xfeature_mask )
>> +            vcpu_pause(v);
>> +            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>> +
>> +            if ( evc->size < size || evc->xfeature_mask != xfeature_mask )
> ... the relaxation from != to < looks somewhat fragile to me, going
> forward.

What is fragile about it?  It is a very common pattern in general, and
we use it elsewhere.

> Did you consider dealing with the issue in the tool stack?

Yes, and rejected doing so.

> It can't be that hard to repeat the size query in case data retrieval fails.

In principle, this is true if the size case was distinguishable from all
the other cases which currently return -EINVAL.

> Such retry logic would be well bounded in terms of iterations it can
> potentially take. In fact ...
>
>> @@ -1103,6 +1109,10 @@ long arch_do_domctl(
>>             }
>>  
>>              vcpu_unpause(v);
>> +
>> +            /* Specify how much data we actually wrote into the buffer. */
>> +            if ( !ret )
>> +                evc->size = size;
> ... if this got written on the earlier error path, there wouldn't even
> be a need to retry the size query: Data retrieval could be retried
> with the new size right after enlarging the buffer.

True, but userspace hypercalls are expensive for the domain in question,
and take a global spinlock in Xen which is expensive for the system as a
whole.  Looking forward to PVH control domains, any hypercall will be
far more expensive (a vmexit/entry vs syscall/sysret).

As such, I am not happy introducing unnecessary hypercalls, as they are
entirely detrimental to dom0 and Xen.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.