Xen project Mailing List

Re: [Xen-devel] [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 12 Sep 2016 14:09:17 +0100

Delivery-date: Mon, 12 Sep 2016 13:09:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 12/09/16 13:33, Jan Beulich wrote: >>>> On 12.09.16 at 14:02, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 12/09/16 12:17, Jan Beulich wrote: >>>>>> On 12.09.16 at 11:51, <andrew.cooper3@xxxxxxxxxx> wrote: >>>> A toolstack must call XEN_DOMCTL_getvcpuextstate twice; first to find the >> size >>>> of the buffer to use, and a second time to get the actual content. >>>> >>>> The reported size was based on v->arch.xcr0_accum, but a guest which >>>> extends >>>> its xcr0_accum between the two hypercalls will cause the toolstack to fail >> the >>>> evc->size != size check, as the provided buffer is now too small. This >>>> causes >>>> a hard error during the final phase of migration. >>>> >>>> Instead, return return a size based on xfeature_mask, which is the maximum >>>> size Xen will ever permit. The hypercall must now tolerate a >>>> toolstack-provided buffer which is overly large (for the case where a guest >>>> isn't using all available xsave states), and should write back how much >>>> data >>>> was actually written into the buffer. >>> To be honest, I'm of two minds here. Part of me thinks this is an >>> okay change. However, in particular ... >>> >>>> --- a/xen/arch/x86/domctl.c >>>> +++ b/xen/arch/x86/domctl.c >>>> @@ -1054,19 +1054,25 @@ long arch_do_domctl( >>>> unsigned int size; >>>> >>>> ret = 0; >>>> - vcpu_pause(v); >>>> >>>> - size = PV_XSAVE_SIZE(v->arch.xcr0_accum); >>>> if ( (!evc->size && !evc->xfeature_mask) || >>>> guest_handle_is_null(evc->buffer) ) >>>> { >>>> + /* >>>> + * A query for the size of buffer to use. Must return the >>>> + * maximum size we ever might hand back to userspace, >> bearing >>>> + * in mind that the vcpu might increase its xcr0_accum >> between >>>> + * this query for size, and the following query for data. >>>> + */ >>>> evc->xfeature_mask = xfeature_mask; >>>> - evc->size = size; >>>> - vcpu_unpause(v); >>>> + evc->size = PV_XSAVE_SIZE(xfeature_mask); >>>> goto vcpuextstate_out; >>>> } >>>> >>>> - if ( evc->size != size || evc->xfeature_mask != xfeature_mask >>>> ) >>>> + vcpu_pause(v); >>>> + size = PV_XSAVE_SIZE(v->arch.xcr0_accum); >>>> + >>>> + if ( evc->size < size || evc->xfeature_mask != xfeature_mask ) >>> ... the relaxation from != to < looks somewhat fragile to me, going >>> forward. >> What is fragile about it? It is a very common pattern in general, and >> we use it elsewhere. > If we get handed too large a buffer, what meaning do you want to > assign to the extra bytes? Them being meaningless is just one > possible interpretation. I am confused, and I think you are too. There is no meaning to the bytes; this is getvcpuextstate, not setvcpuextstate. All it means is that Xen doesn't need to use all the buffer space provided by the toolstack to write the current state into. It is exactly the same concept as read(fd, buf, 4096) returning 1024. Only some of the potential buffer was actually filled by the call. > >>> Did you consider dealing with the issue in the tool stack? >> Yes, and rejected doing so. >> >>> It can't be that hard to repeat the size query in case data retrieval fails. >> In principle, this is true if the size case was distinguishable from all >> the other cases which currently return -EINVAL. > It easily is, as long as the caller know what size it used on the most > recent earlier call (since it would then notice it having grown). > >>>> @@ -1103,6 +1109,10 @@ long arch_do_domctl( >>>> } >>>> >>>> vcpu_unpause(v); >>>> + >>>> + /* Specify how much data we actually wrote into the buffer. */ >>>> + if ( !ret ) >>>> + evc->size = size; >>> ... if this got written on the earlier error path, there wouldn't even >>> be a need to retry the size query: Data retrieval could be retried >>> with the new size right after enlarging the buffer. >> True, but userspace hypercalls are expensive for the domain in question, >> and take a global spinlock in Xen which is expensive for the system as a >> whole. Looking forward to PVH control domains, any hypercall will be >> far more expensive (a vmexit/entry vs syscall/sysret). >> >> As such, I am not happy introducing unnecessary hypercalls, as they are >> entirely detrimental to dom0 and Xen. > Let's be realistic: How often do you expect xcr0_accum to change > in practice between the size query and the data retrieval? Very very slim, but not 0. > I'm perfectly fine fixing the theoretical issue, but I don't think there's > a performance aspect to be considered when deciding how to deal > with it. This doesn't mean we should fix it in an inefficient way. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.