Xen project Mailing List

Re: [PING] Re: [PATCH] xen/arm: optee: Allocate anonymous domheap pages

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Fri, 8 Oct 2021 14:49:35 -0700 (PDT)

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Oleksandr <olekstysh@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Julien Grall <jgrall@xxxxxxxxxx>

Delivery-date: Fri, 08 Oct 2021 21:49:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, 8 Oct 2021, Julien Grall wrote: > On 07/10/2021 23:14, Stefano Stabellini wrote: > > On Thu, 7 Oct 2021, Volodymyr Babchuk wrote: > > > Hi Stefano, > > > > > > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes: > > > > > > > On Wed, 6 Oct 2021, Oleksandr wrote: > > > > > Hello all > > > > > > > > > > Gentle reminder. > > > > Many thanks for the ping, this patch fell off my radar. > > > > > > > > > > > > > > > > > On 23.09.21 23:57, Volodymyr Babchuk wrote: > > > > > > Hi Stefano, > > > > > > > > > > > > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes: > > > > > > > > > > > > > On Mon, 6 Sep 2021, Oleksandr Tyshchenko wrote: > > > > > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx> > > > > > > > > > > > > > > > > Allocate anonymous domheap pages as there is no strict need to > > > > > > > > account them to a particular domain. > > > > > > > > > > > > > > > > Since XSA-383 "xen/arm: Restrict the amount of memory that > > > > > > > > dom0less > > > > > > > > domU and dom0 can allocate" the dom0 cannot allocate memory > > > > > > > > outside > > > > > > > > of the pre-allocated region. This means if we try to allocate > > > > > > > > non-anonymous page to be accounted to dom0 we will get an > > > > > > > > over-allocation issue when assigning that page to the domain. > > > > > > > > The anonymous page, in turn, is not assigned to any domain. > > > > > > > > > > > > > > > > CC: Julien Grall <jgrall@xxxxxxxxxx> > > > > > > > > Signed-off-by: Oleksandr Tyshchenko > > > > > > > > <oleksandr_tyshchenko@xxxxxxxx> > > > > > > > > Acked-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> > > > > > > > Only one question, which is more architectural: given that these > > > > > > > pages > > > > > > > are "unlimited", could the guest exploit the interface somehow to > > > > > > > force > > > > > > > Xen to allocate an very high number of anonymous pages? > > > > > > > > > > > > > > E.g. could a domain call OPTEE_SMC_RPC_FUNC_ALLOC in a loop to > > > > > > > force Xen > > > > > > > to exaust all memory pages? > > > > > > Generally, OP-TEE mediator tracks all resources allocated and > > > > > > imposes > > > > > > limits on them. > > > > > > > > > > > > OPTEE_SMC_RPC_FUNC_ALLOC case is a bit different, because it is > > > > > > issued > > > > > > not by domain, but by OP-TEE itself. As OP-TEE is more trusted piece > > > > > > of > > > > > > system we allow it to request as many buffers as it wants. Also, we > > > > > > know > > > > > > that OP-TEE asks only for one such buffer per every standard call. > > > > > > And > > > > > > number of simultaneous calls is limited by number of OP-TEE threads, > > > > > > which is quite low: typically only two. > > > > > > > > So let me repeat it differently to see if I understood correctly: > > > > > > > > - OPTEE_SMC_RPC_FUNC_ALLOC is only called by OP-TEE, not by the domain > > > > - OPTEE is trusted and only call it twice anyway > > > > > > Correct. > > > > > > > I am OK with this argument, but do we have a check to make sure a domU > > > > cannot issue OPTEE_SMC_RPC_FUNC_ALLOC? > > > > > > domU can't issue any RPC, because all RPCs are issued from OP-TEE > > > side. This is the nature of RPC - OP-TEE requests Normal World for some > > > service. But of course, Normal World can perform certain actions that > > > will make OP-TEE to issue a RPC. I discuss this in depth below. > > > > > > > > > > > Looking at the patch, there are other two places, in addition to > > > > OPTEE_SMC_RPC_FUNC_ALLOC, where the anonymous memory pages can be > > > > allocated: > > > > > > > > 1) copy_std_request > > > > 2) translate_noncontig > > > > > > > > We need to prove that neither 1) or 2) can result in a domU exausting > > > > Xen memory. > > > > > > > > In the case of 1), it looks like the memory is freed before returning to > > > > the DomU, right? If so, it should be no problem? > > > > > > Yes, mediator makes shadow copy of every request buffer to hide > > > translated addresses from the guest. Number of requests is limited by > > > number of OP-TEE threads. > > > > > > > In the case of 2), it looks like the memory could outlive the call where > > > > it is allocated. Is there any kind of protection against issuing > > > > something like OPTEE_MSG_ATTR_TYPE_TMEM_INOUT in a loop? Is it OP-TEE > > > > itself that would refuse the attempt? Thus, the idea is that > > > > do_call_with_arg will return error and we'll just free the memory there? > > > > > > Well, translate_noncontig() calls allocate_optee_shm_buf() which counts > > > all allocated buffers. So you can't call it more than > > > MAX_SHM_BUFFER_COUNT times, without de-allocating previous memory. But, > > > thanks to your question, I have found a bug there: memory is not freed > > > if allocate_optee_shm_buf() fails. I'll prepare patch later today. > > > > > > > I cannot see a check for errors returned by do_call_with_arg and memory > > > > freeing done because of that. Sorry I am not super familiar with the > > > > code, I am just trying to make sure we are not offering to DomUs an easy > > > > way to crash the system. > > > > > > I tried to eliminate all possibilities for a guest to crash the > > > system. Of course, this does not mean that there are none of them. > > > > > > And yes, code is a bit hard to understand, because calls to OP-TEE are > > > stateful and you need to account for that state. From NW and SW this > > > looks quite fine, because state is handled naturally. But mediator sits > > > in a middle, so it's implementation is a bit messy. > > > > > > I'll try to explain what is going on, so you it will be easier to > > > understand logic in the mediator. > > > > > > There are two types of OP-TEE calls: fast calls and standard calls. Fast > > > call is simple: call SMC and get result. It does not allocate thread > > > context in OP-TEE and is non-preemptive. So yes, it should be fast. It > > > is used for simple things like "get OP-TEE version" or "exchange > > > capabilities". It is easy to handle them in mediator: just forward > > > the call, check result, return it back to a guest. > > > > > > Standard calls are stateful. OP-TEE allocates thread for each call. This > > > call can be preempted either by IRQ or by RPC. For consistency IRQ > > > return is also considered as special type of RPC. So, in general one > > > standard call can consist of series of SMCs: > > > > > > --> SMC with request > > > <-- RPC return (like IRQ) > > > --> SMC "resume call" > > > <-- RPC return (like "read disk") > > > --> SMC "resume call" > > > <-- RPC return (like "send network packet") > > > --> SMC "resume call" > > > ... > > > <-- Final return > > > > > > There are many types of RPCs: "handle IRQ", additional shared buffer > > > allocation/de-allocation, RPMB access, disks access, network access, > > > synchronization primitives (when OP-TEE thread is gets blocked on a > > > mutex), etc. > > > > > > Two more things that makes all this worse: Normal World can register > > > shared buffer with OP-TEE. Such buffer can live indefinitely > > > long. Also, Normal World decides when to resume call. For example, > > > calling process can be preempted and then resumed seconds > > > later. Misbehaving guest can decide to not resume call at all. > > > > > > As I said, I tried to take all this things into account. There are > > > basically 3 types of objects that can lead to memory allocation on Xen > > > side: > > > > > > 1. Standard call context. Besides memory space for struct optee_std_call > > > itself it allocates page for a shadow buffer, where argument addresses > > > are translated by Xen. Number of this objects is limited by number of > > > OP-TEE threads: > > > > > > count = atomic_add_unless(&ctx->call_count, 1, max_optee_threads); > > > if ( count == max_optee_threads ) > > > return ERR_PTR(-ENOSPC); > > > > > > 2. Shared buffer. This is a buffer shared by guest with OP-TEE. It can > > > be either temporary buffer which is shared for one standard call > > > duration, or registered shared buffer, which is remains active until it > > > is de-registered. This is where translate_noncontig() comes into play. > > > Number of this buffers is limited in allocate_optee_shm_buf(): > > > > > > count = atomic_add_unless(&ctx->optee_shm_buf_count, 1, > > > MAX_SHM_BUFFER_COUNT); > > > if ( count == MAX_SHM_BUFFER_COUNT ) > > > return ERR_PTR(-ENOMEM); > > > > > > 3. Shared RPC buffer. This is very special kind of buffer. Basically, > > > OP-TEE needs some shared memory to provide RPC call parameters. So it > > > requests buffer from Normal World. There is no hard limit on this from > > > mediator side, because, as I told earlier, OP-TEE itself limits number > > > of this buffers. There is no cases when more that one buffer will be > > > allocated per OP-TEE thread. This type of buffer is used only to process > > > RPC requests themselves. OP-TEE can request more buffers via RPC, but > > > they will fall to p.2: NW uses separate request to register buffer and > > > then returns its handle in the preempted call. > > > > > > > > > Apart from those two limits, there is a limit on total number of pages > > > which is shared between guest and OP-TEE: MAX_TOTAL_SMH_BUF_PG. This > > > limit is for a case when guest tries to allocate few really BIG buffers. > > > > > > > > > > It looks like they could be called from one of the OPTEE operations that > > > > a domU could request? Is there a limit for them? > > > > > > Yes, there are limits, as I described above. > > > > > > Also, bear in mind that resources available to OP-TEE are also quite > > > limited. So, in case of some breach in mediator, OP-TEE will give up > > > first. This of course is not an excuse to have bugs in the mediator... > > > > OK, thanks for the explanation. The reasons for my questions is that if > > the allocations are using the memory of DomU, then at worst DomU can run > > out of memory. > > Not really. The worst outcome is still a DoS of the host because we don't > pre-allocate memory or even check that the total allocation will not exhaust > the memory. > > The only difference is I would argue this would be a misconfiguration of the > system. > > > But if the allocations are using anonymous memory, then > > the whole platform might run out of memory. We have issued XSAs for > > things like that in the past. > > > > This is why I am worried about this patch: if we apply it we really > > become reliant on these limits being implemented correctly. A bug can > > have much more severe consequences. > > This is not a problem specific to OP-TEE. Any anymous allocation (xmalloc,...) > done in Xen on behalf of the guest has, in theory, the same problem (see more > below). > > > > > As you are the maintainer for this code, and this code is not security > > supported, I'll leave it up to you (also see the other email about > > moving optee to "supported, not security supported"). > > > > However, maybe a different solution would be to increase max_pages for a > > domain when optee is enabled? Maybe just by a few pages (as many as > > needed by the optee mediator)? Because if we did that, we wouldn't risk > > exposing DOS attack vectors for every bug in the mediator limits checks. > I think we need to differentiate two sorts of allocation: > 1) Memory used by Xen on behalf of the guest > 2) Memory used by the guest itself > > d->max_pages is only meant to refer to the latter (in fact, a guest can > balloon memory up to d->max_pages). In this case, we are discussing about the > latter and therefore I think the should be accounted differently as the memory > is not exposed to the guest. Yeah, I was thinking the same thing yesterday but I didn't suggest it because we don't have a way to specify it today. I thought that rather than using anonymous memory it would still be preferable from a security standpoint to use d->max_pages, although I was aware of issues such as the guest able to balloon up to d->max_pages. BTW do you know on top of your head of other things that can cause anonymous xmalloc (that lives longer than a single trap or hypercall) in Xen on ARM today? I'll try to keep an eye on them. > Today, Xen doesn't have this facility. I know this has been discussed a few > times in the past, but AFAIK, a patch series never materialized for it. > > > However, to me, this sounds more an hardening work for the whole Xen rather > than OP-TEE itself. So I think the patch provided by Oleksandr is probably the > best way to go for this release. I am OK with that especially as OP-TEE is not security supported (I recently suggested to upgrade its support status to "supported, not security supported".) I wrote a summary of the options at the boottom of the email. > > The below adds a 10 pages slack. > > > > > > diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c > > index 3964a8a5cd..a3105f1a9a 100644 > > --- a/xen/arch/arm/tee/tee.c > > +++ b/xen/arch/arm/tee/tee.c > > @@ -38,8 +38,11 @@ bool tee_handle_call(struct cpu_user_regs *regs) > > return cur_mediator->ops->handle_call(regs); > > } > > +#define TEE_SLACK (10) > > int tee_domain_init(struct domain *d, uint16_t tee_type) > > { > > + int ret; > > + > > if ( tee_type == XEN_DOMCTL_CONFIG_TEE_NONE ) > > return 0; > > @@ -49,7 +52,15 @@ int tee_domain_init(struct domain *d, uint16_t > > tee_type) > > if ( cur_mediator->tee_type != tee_type ) > > return -EINVAL; > > - return cur_mediator->ops->domain_init(d); > > + ret = cur_mediator->ops->domain_init(d); > > + if ( ret < 0 ) > > + return ret; > > + > > + /* > > + * Increase maxmem for domains with TEE, the extra pages are used by > > + * the mediator > > + */ > > + d->max_pages += TEE_SLACK; > > Regardless what I wrote above, this change would be incorrect because TEE is > initialized the when domain is created. However, d->max_pages is set > afterwards via DOMCTL_max_mem, so the value will get overridden. > > However, I don't think OP-TEE code should modify d->max_pages. Instead, this > should be accounted by the toolstack (or domain_build for dom0/domU created by > Xen). Good point, and I can see that libxl has already a slack_memkb concept for PV x86 used to increase the memmap limit. The best solution would be to introduce a generic framework for accounting memory that Xen allocates on behalf of the guest. But of course we don't want to ask Oleksandr to do that now 1 week from the release and in response to a simple 3 lines patch. With the best option not being available, we have to pick one of the following: 1) current patch that uses anonymous memory 2) slack but done right (the toolstack and domain_build apply the slack to d->max_pages) My preference is 2) for security reasons but it is a bit more work. Taking into account that Julien, Volodymyr, and Bertrand all think that 1) is acceptable as is, then I will not insist. Option 1) is OK.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.