[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate



On 12/09/16 13:33, Jan Beulich wrote:
>>>> On 12.09.16 at 14:02, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 12/09/16 12:17, Jan Beulich wrote:
>>>>>> On 12.09.16 at 11:51, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> A toolstack must call XEN_DOMCTL_getvcpuextstate twice; first to find the 
>> size
>>>> of the buffer to use, and a second time to get the actual content.
>>>>
>>>> The reported size was based on v->arch.xcr0_accum, but a guest which 
>>>> extends
>>>> its xcr0_accum between the two hypercalls will cause the toolstack to fail 
>> the
>>>> evc->size != size check, as the provided buffer is now too small.  This 
>>>> causes
>>>> a hard error during the final phase of migration.
>>>>
>>>> Instead, return return a size based on xfeature_mask, which is the maximum
>>>> size Xen will ever permit.  The hypercall must now tolerate a
>>>> toolstack-provided buffer which is overly large (for the case where a guest
>>>> isn't using all available xsave states), and should write back how much 
>>>> data
>>>> was actually written into the buffer.
>>> To be honest, I'm of two minds here. Part of me thinks this is an
>>> okay change. However, in particular ...
>>>
>>>> --- a/xen/arch/x86/domctl.c
>>>> +++ b/xen/arch/x86/domctl.c
>>>> @@ -1054,19 +1054,25 @@ long arch_do_domctl(
>>>>              unsigned int size;
>>>>  
>>>>              ret = 0;
>>>> -            vcpu_pause(v);
>>>>  
>>>> -            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>>>              if ( (!evc->size && !evc->xfeature_mask) ||
>>>>                   guest_handle_is_null(evc->buffer) )
>>>>              {
>>>> +                /*
>>>> +                 * A query for the size of buffer to use.  Must return the
>>>> +                 * maximum size we ever might hand back to userspace, 
>> bearing
>>>> +                 * in mind that the vcpu might increase its xcr0_accum 
>> between
>>>> +                 * this query for size, and the following query for data.
>>>> +                 */
>>>>                  evc->xfeature_mask = xfeature_mask;
>>>> -                evc->size = size;
>>>> -                vcpu_unpause(v);
>>>> +                evc->size = PV_XSAVE_SIZE(xfeature_mask);
>>>>                  goto vcpuextstate_out;
>>>>              }
>>>>  
>>>> -            if ( evc->size != size || evc->xfeature_mask != xfeature_mask 
>>>> )
>>>> +            vcpu_pause(v);
>>>> +            size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>>> +
>>>> +            if ( evc->size < size || evc->xfeature_mask != xfeature_mask )
>>> ... the relaxation from != to < looks somewhat fragile to me, going
>>> forward.
>> What is fragile about it?  It is a very common pattern in general, and
>> we use it elsewhere.
> If we get handed too large a buffer, what meaning do you want to
> assign to the extra bytes? Them being meaningless is just one
> possible interpretation.

I am confused, and I think you are too.  There is no meaning to the
bytes; this is getvcpuextstate, not setvcpuextstate.

All it means is that Xen doesn't need to use all the buffer space
provided by the toolstack to write the current state into.


It is exactly the same concept as read(fd, buf, 4096) returning 1024. 
Only some of the potential buffer was actually filled by the call.

>
>>> Did you consider dealing with the issue in the tool stack?
>> Yes, and rejected doing so.
>>
>>> It can't be that hard to repeat the size query in case data retrieval fails.
>> In principle, this is true if the size case was distinguishable from all
>> the other cases which currently return -EINVAL.
> It easily is, as long as the caller know what size it used on the most
> recent earlier call (since it would then notice it having grown).
>
>>>> @@ -1103,6 +1109,10 @@ long arch_do_domctl(
>>>>             }
>>>>  
>>>>              vcpu_unpause(v);
>>>> +
>>>> +            /* Specify how much data we actually wrote into the buffer. */
>>>> +            if ( !ret )
>>>> +                evc->size = size;
>>> ... if this got written on the earlier error path, there wouldn't even
>>> be a need to retry the size query: Data retrieval could be retried
>>> with the new size right after enlarging the buffer.
>> True, but userspace hypercalls are expensive for the domain in question,
>> and take a global spinlock in Xen which is expensive for the system as a
>> whole.  Looking forward to PVH control domains, any hypercall will be
>> far more expensive (a vmexit/entry vs syscall/sysret).
>>
>> As such, I am not happy introducing unnecessary hypercalls, as they are
>> entirely detrimental to dom0 and Xen.
> Let's be realistic: How often do you expect xcr0_accum to change
> in practice between the size query and the data retrieval?

Very very slim, but not 0.

> I'm perfectly fine fixing the theoretical issue, but I don't think there's
> a performance aspect to be considered when deciding how to deal
> with it.

This doesn't mean we should fix it in an inefficient way.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.