[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate



>>> On 12.09.16 at 14:02, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 12/09/16 12:17, Jan Beulich wrote:
>>>>> On 12.09.16 at 11:51, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> A toolstack must call XEN_DOMCTL_getvcpuextstate twice; first to find the 
> size
>>> of the buffer to use, and a second time to get the actual content.
>>>
>>> The reported size was based on v->arch.xcr0_accum, but a guest which extends
>>> its xcr0_accum between the two hypercalls will cause the toolstack to fail 
> the
>>> evc->size != size check, as the provided buffer is now too small.  This 
>>> causes
>>> a hard error during the final phase of migration.
>>>
>>> Instead, return return a size based on xfeature_mask, which is the maximum
>>> size Xen will ever permit.  The hypercall must now tolerate a
>>> toolstack-provided buffer which is overly large (for the case where a guest
>>> isn't using all available xsave states), and should write back how much data
>>> was actually written into the buffer.
>> To be honest, I'm of two minds here. Part of me thinks this is an
>> okay change. However, in particular ...
>>
>>> --- a/xen/arch/x86/domctl.c
>>> +++ b/xen/arch/x86/domctl.c
>>> @@ -1054,19 +1054,25 @@ long arch_do_domctl(
>>>              unsigned int size;
>>>  
>>>              ret = 0;
>>> -            vcpu_pause(v);
>>>  
>>> -            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>>              if ( (!evc->size && !evc->xfeature_mask) ||
>>>                   guest_handle_is_null(evc->buffer) )
>>>              {
>>> +                /*
>>> +                 * A query for the size of buffer to use.  Must return the
>>> +                 * maximum size we ever might hand back to userspace, 
> bearing
>>> +                 * in mind that the vcpu might increase its xcr0_accum 
> between
>>> +                 * this query for size, and the following query for data.
>>> +                 */
>>>                  evc->xfeature_mask = xfeature_mask;
>>> -                evc->size = size;
>>> -                vcpu_unpause(v);
>>> +                evc->size = PV_XSAVE_SIZE(xfeature_mask);
>>>                  goto vcpuextstate_out;
>>>              }
>>>  
>>> -            if ( evc->size != size || evc->xfeature_mask != xfeature_mask )
>>> +            vcpu_pause(v);
>>> +            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>> +
>>> +            if ( evc->size < size || evc->xfeature_mask != xfeature_mask )
>> ... the relaxation from != to < looks somewhat fragile to me, going
>> forward.
> 
> What is fragile about it?  It is a very common pattern in general, and
> we use it elsewhere.

If we get handed too large a buffer, what meaning do you want to
assign to the extra bytes? Them being meaningless is just one
possible interpretation.

>> Did you consider dealing with the issue in the tool stack?
> 
> Yes, and rejected doing so.
> 
>> It can't be that hard to repeat the size query in case data retrieval fails.
> 
> In principle, this is true if the size case was distinguishable from all
> the other cases which currently return -EINVAL.

It easily is, as long as the caller know what size it used on the most
recent earlier call (since it would then notice it having grown).

>>> @@ -1103,6 +1109,10 @@ long arch_do_domctl(
>>>             }
>>>  
>>>              vcpu_unpause(v);
>>> +
>>> +            /* Specify how much data we actually wrote into the buffer. */
>>> +            if ( !ret )
>>> +                evc->size = size;
>> ... if this got written on the earlier error path, there wouldn't even
>> be a need to retry the size query: Data retrieval could be retried
>> with the new size right after enlarging the buffer.
> 
> True, but userspace hypercalls are expensive for the domain in question,
> and take a global spinlock in Xen which is expensive for the system as a
> whole.  Looking forward to PVH control domains, any hypercall will be
> far more expensive (a vmexit/entry vs syscall/sysret).
> 
> As such, I am not happy introducing unnecessary hypercalls, as they are
> entirely detrimental to dom0 and Xen.

Let's be realistic: How often do you expect xcr0_accum to change
in practice between the size query and the data retrieval? I'm
perfectly fine fixing the theoretical issue, but I don't think there's
a performance aspect to be considered when deciding how to deal
with it.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.