[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PING] Re: [PATCH] xen/arm: optee: Allocate anonymous domheap pages



On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> > On Thu, 7 Oct 2021, Volodymyr Babchuk wrote:
> >> Hi Stefano,
> >> 
> >> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> >> 
> >> > On Wed, 6 Oct 2021, Oleksandr wrote:
> >> >> Hello all
> >> >> 
> >> >> Gentle reminder.
> >> >  
> >> > Many thanks for the ping, this patch fell off my radar.
> >> >
> >> >
> >> >  
> >> >> On 23.09.21 23:57, Volodymyr Babchuk wrote:
> >> >> > Hi Stefano,
> >> >> > 
> >> >> > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> >> >> > 
> >> >> > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote:
> >> >> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
> >> >> > > > 
> >> >> > > > Allocate anonymous domheap pages as there is no strict need to
> >> >> > > > account them to a particular domain.
> >> >> > > > 
> >> >> > > > Since XSA-383 "xen/arm: Restrict the amount of memory that 
> >> >> > > > dom0less
> >> >> > > > domU and dom0 can allocate" the dom0 cannot allocate memory 
> >> >> > > > outside
> >> >> > > > of the pre-allocated region. This means if we try to allocate
> >> >> > > > non-anonymous page to be accounted to dom0 we will get an
> >> >> > > > over-allocation issue when assigning that page to the domain.
> >> >> > > > The anonymous page, in turn, is not assigned to any domain.
> >> >> > > > 
> >> >> > > > CC: Julien Grall <jgrall@xxxxxxxxxx>
> >> >> > > > Signed-off-by: Oleksandr Tyshchenko 
> >> >> > > > <oleksandr_tyshchenko@xxxxxxxx>
> >> >> > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> >> >> > > Only one question, which is more architectural: given that these 
> >> >> > > pages
> >> >> > > are "unlimited", could the guest exploit the interface somehow to 
> >> >> > > force
> >> >> > > Xen to allocate an very high number of anonymous pages?
> >> >> > > 
> >> >> > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to 
> >> >> > > force Xen
> >> >> > > to exaust all memory pages?
> >> >> > Generally, OP-TEE mediator tracks all resources allocated and imposes
> >> >> > limits on them.
> >> >> > 
> >> >> > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is issued
> >> >> > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece 
> >> >> > of
> >> >> > system we allow it to request as many buffers as it wants. Also, we 
> >> >> > know
> >> >> > that OP-TEE asks only for one such buffer per every standard call. And
> >> >> > number of simultaneous calls is limited by number of OP-TEE threads,
> >> >> > which is quite low: typically only two.
> >> >
> >> > So let me repeat it differently to see if I understood correctly:
> >> >
> >> > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain
> >> > - OPTEE is trusted and only call it twice anyway
> >> 
> >> Correct.
> >> 
> >> > I am OK with this argument, but do we have a check to make sure a domU
> >> > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC?
> >> 
> >> domU can't issue any RPC, because all RPCs are issued from OP-TEE
> >> side. This is the nature of RPC - OP-TEE requests Normal World for some
> >> service. But of course, Normal World can perform certain actions that
> >> will make OP-TEE to issue a RPC. I discuss this in depth below.
> >> 
> >> >
> >> > Looking at the patch, there are other two places, in addition to
> >> > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be
> >> > allocated:
> >> >
> >> > 1) copy_std_request
> >> > 2) translate_noncontig
> >> >
> >> > We need to prove that neither 1) or 2) can result in a domU exausting
> >> > Xen memory.
> >> >
> >> > In the case of 1), it looks like the memory is freed before returning to
> >> > the DomU, right? If so, it should be no problem?
> >> 
> >> Yes, mediator makes shadow copy of every request buffer to hide
> >> translated addresses from the guest. Number of requests is limited by
> >> number of OP-TEE threads.
> >> 
> >> > In the case of 2), it looks like the memory could outlive the call where
> >> > it is allocated. Is there any kind of protection against issuing
> >> > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE
> >> > itself that would refuse the attempt? Thus, the idea is that
> >> > do_call_with_arg will return error and we'll just free the memory there?
> >> 
> >> Well, translate_noncontig() calls allocate_optee_shm_buf() which counts
> >> all allocated buffers. So you can't call it more than
> >> MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But,
> >> thanks to your question, I have found a bug there: memory is not freed
> >> if allocate_optee_shm_buf() fails. I'll prepare patch later today.
> >> 
> >> > I cannot see a check for errors returned by do_call_with_arg and memory
> >> > freeing done because of that. Sorry I am not super familiar with the
> >> > code, I am just trying to make sure we are not offering to DomUs an easy
> >> > way to crash the system.
> >> 
> >> I tried to eliminate all possibilities for a guest to crash the
> >> system. Of course, this does not mean that there are none of them.
> >> 
> >> And yes, code is a bit hard to understand, because calls to OP-TEE are
> >> stateful and you need to account for that state. From NW and SW this
> >> looks quite fine, because state is handled naturally. But mediator sits
> >> in a middle, so it's implementation is a bit messy.
> >> 
> >> I'll try to explain what is going on, so you it will be easier to
> >> understand logic in the mediator.
> >> 
> >> There are two types of OP-TEE calls: fast calls and standard calls. Fast
> >> call is simple: call SMC and get result. It does not allocate thread
> >> context in OP-TEE and is non-preemptive. So yes, it should be fast. It
> >> is used for simple things like "get OP-TEE version" or "exchange
> >> capabilities". It is easy to handle them in mediator: just forward
> >> the call, check result, return it back to a guest.
> >> 
> >> Standard calls are stateful. OP-TEE allocates thread for each call. This
> >> call can be preempted either by IRQ or by RPC. For consistency IRQ
> >> return is also considered as special type of RPC. So, in general one
> >> standard call can consist of series of SMCs:
> >> 
> >> --> SMC with request
> >> <-- RPC return (like IRQ)
> >> --> SMC "resume call"
> >> <-- RPC return (like "read disk")
> >> --> SMC "resume call"
> >> <-- RPC return (like "send network packet")
> >> --> SMC "resume call"
> >> ...
> >> <-- Final return
> >> 
> >> There are many types of RPCs: "handle IRQ", additional shared buffer
> >> allocation/de-allocation, RPMB access, disks access, network access,
> >> synchronization primitives (when OP-TEE thread is gets blocked on a
> >> mutex), etc.
> >> 
> >> Two more things that makes all this worse: Normal World can register
> >> shared buffer with OP-TEE. Such buffer can live indefinitely
> >> long. Also, Normal World decides when to resume call. For example,
> >> calling process can be preempted and then resumed seconds
> >> later. Misbehaving guest can decide to not resume call at all.
> >> 
> >> As I said, I tried to take all this things into account. There are
> >> basically 3 types of objects that can lead to memory allocation on Xen
> >> side:
> >> 
> >> 1. Standard call context. Besides memory space for struct optee_std_call
> >> itself it allocates page for a shadow buffer, where argument addresses
> >> are translated by Xen. Number of this objects is limited by number of
> >> OP-TEE threads:
> >> 
> >>     count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads);
> >>     if ( count == max_optee_threads )
> >>         return ERR_PTR(-ENOSPC);
> >> 
> >> 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can
> >> be either temporary buffer which is shared for one standard call
> >> duration, or registered shared buffer, which is remains active until it
> >> is de-registered. This is where translate_noncontig() comes into play.
> >> Number of this buffers is limited in allocate_optee_shm_buf():
> >> 
> >>     count = atomic_add_unless(&ctx->optee_shm_buf_count, 1,
> >>                               MAX_SHM_BUFFER_COUNT);
> >>     if ( count == MAX_SHM_BUFFER_COUNT )
> >>         return ERR_PTR(-ENOMEM);
> >> 
> >> 3. Shared RPC buffer. This is very special kind of buffer. Basically,
> >> OP-TEE needs some shared memory to provide RPC call parameters. So it
> >> requests buffer from Normal World. There is no hard limit on this from
> >> mediator side, because, as I told earlier, OP-TEE itself limits number
> >> of this buffers. There is no cases when more that one buffer will be
> >> allocated per OP-TEE thread. This type of buffer is used only to process
> >> RPC requests themselves. OP-TEE can request more buffers via RPC, but
> >> they will fall to p.2: NW uses separate request to register buffer and
> >> then returns its handle in the preempted call.
> >> 
> >> 
> >> Apart from those two limits, there is a limit on total number of pages
> >> which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This
> >> limit is for a case when guest tries to allocate few really BIG buffers.
> >> 
> >> 
> >> > It looks like they could be called from one of the OPTEE operations that
> >> > a domU could request? Is there a limit for them?
> >> 
> >> Yes, there are limits, as I described above.
> >> 
> >> Also, bear in mind that resources available to OP-TEE are also quite
> >> limited. So, in case of some breach in mediator, OP-TEE will give up
> >> first. This of course is not an excuse to have bugs in the mediator...
> >
> > OK, thanks for the explanation. The reasons for my questions is that if
> > the allocations are using the memory of DomU, then at worst DomU can run
> > out of memory. But if the allocations are using anonymous memory, then
> > the whole platform might run out of memory. We have issued XSAs for
> > things like that in the past.
> >
> > This is why I am worried about this patch: if we apply it we really
> > become reliant on these limits being implemented correctly. A bug can
> > have much more severe consequences.
> 
> Agree.
> 
> > As you are the maintainer for this code, and this code is not security
> > supported, I'll leave it up to you (also see the other email about
> > moving optee to "supported, not security supported").
> 
> Yes, I've seen this email. Just didn't had time to write followup.
> 
> > However, maybe a different solution would be to increase max_pages for a
> > domain when optee is enabled? Maybe just by a few pages (as many as
> > needed by the optee mediator)? Because if we did that, we wouldn't risk
> > exposing DOS attack vectors for every bug in the mediator limits checks.
> >
> > The below adds a 10 pages slack.
> 
> Well, I didn't know that such option is available. If this is a valid
> approach and there are no objections from other maintainers, I'd rather
> use it.

I think it is a valid approach, and it is "more secure" than the other
patch. I suggest that you send a patch for it so that if people can
voice their objections if any.


> Only one comment there is about number of pages. Maximal number of
> domheap pages used per request is 6: one for request itself, one for RPC
> buffer, 4 at most for request arguments. I checked OP-TEE configuration,
> looks like some platforms allow up to 16 threads. This yields 6 * 16 = 96
> pages in total. If this is acceptable I'd set TEE_SLACK to 96.

OK. Please add a good in-code comment explaining how you got to 96.


> > diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> > index 3964a8a5cd..a3105f1a9a 100644
> > --- a/xen/arch/arm/tee/tee.c
> > +++ b/xen/arch/arm/tee/tee.c
> > @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs)
> >      return cur_mediator->ops->handle_call(regs);
> >  }
> >  
> > +#define TEE_SLACK (10)
> >  int tee_domain_init(struct domain *d, uint16_t tee_type)
> >  {
> > +    int ret;
> > +
> >      if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE )
> >          return 0;
> >  
> > @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t tee_type)
> >      if ( cur_mediator->tee_type != tee_type )
> >          return -EINVAL;
> >  
> > -    return cur_mediator->ops->domain_init(d);
> > +    ret = cur_mediator->ops->domain_init(d);
> > +    if ( ret < 0 )
> > +        return ret;
> > +
> > +    /* 
> > +     * Increase maxmem for domains with TEE, the extra pages are used by
> > +     * the mediator
> > +     */
> > +    d->max_pages += TEE_SLACK;
> >  }
> >  
> >  int tee_relinquish_resources(struct domain *d)




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.