[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 06/11] optee: add std call handling
Hi Julien, Julien Grall writes: > Hi Volodymyr, > > On 17/01/2019 15:21, Volodymyr Babchuk wrote: >> Julien Grall writes: >> >>> Hi Volodymyr, >>> >>> On 18/12/2018 21:11, Volodymyr Babchuk wrote: >>>> From: Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx> >>>> >>>> The main way to communicate with OP-TEE is to issue standard SMCCC >>>> call. "Standard" is a SMCCC term and it means that call can be >>>> interrupted and OP-TEE can return control to NW before completing >>>> the call. >>>> >>>> In contrast with fast calls, where arguments and return values >>>> are passed in registers, standard calls use shared memory. Register >>>> pair a1,a2 holds 64-bit PA of command buffer, where all arguments >>>> are stored and which is used to return data. OP-TEE internally >>>> copies contents of this buffer into own secure memory before accessing >>>> and validating any data in command buffer. This is done to make sure >>>> that NW will not change contents of the validated parameters. >>>> >>>> Mediator needs to do the same for number of reasons: >>>> >>>> 1. To make sure that guest will not change data after validation. >>>> 2. To translate IPAs to PAs in the command buffer (this is not done >>>> in this patch). >>>> 3. To hide translated address from guest, so it will not be able >>>> to do IPA->PA translation by misusing mediator. >>>> >>>> During standard call OP-TEE can issue multiple "RPC returns", asking >>>> NW to do some work for OP-TEE. NW then issues special call >>>> OPTEE_SMC_CALL_RETURN_FROM_RPC to resume handling of the original call. >>>> Thus, mediator needs to maintain context for original standard call >>>> during multiple SMCCC calls. >>>> >>>> Standard call is considered complete, when returned value is >>>> not a RPC request. >>>> >>>> Signed-off-by: Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx> >>>> --- >>>> >>>> Changes from v2: >>>> - renamed struct domain_ctx to struct optee_domain >>>> - fixed coding style >>>> - Now I use access_guest_memory_by_ipa() instead of mappings >>>> to read command buffer >>>> - Added tracking for in flight calls, so guest can't resume >>>> the same call from two CPUs simultaniously >>>> >>>> xen/arch/arm/tee/optee.c | 319 ++++++++++++++++++++++++++++++++++- >>>> xen/include/asm-arm/domain.h | 3 + >>>> 2 files changed, 320 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c >>>> index 584241b03a..dc90e2ed8e 100644 >>>> --- a/xen/arch/arm/tee/optee.c >>>> +++ b/xen/arch/arm/tee/optee.c >>>> @@ -12,6 +12,8 @@ >>>> */ >>>> #include <xen/device_tree.h> >>>> +#include <xen/domain_page.h> >>>> +#include <xen/guest_access.h> >>>> #include <xen/sched.h> >>>> #include <asm/smccc.h> >>>> #include <asm/tee/tee.h> >>>> @@ -22,11 +24,38 @@ >>>> /* Client ID 0 is reserved for hypervisor itself */ >>>> #define OPTEE_CLIENT_ID(domain) (domain->domain_id + 1) >>>> +/* >>>> + * Maximal number of concurrent standard calls from one guest. This >>>> + * corresponds to OPTEE configuration option CFG_NUM_THREADS, because >>>> + * OP-TEE spawns a thread for every standard call. >>> >>> Looking at OP-TEE, CFG_NUM_THREADS will vary depending on the >>> platform. Is there any way to probe that number of threads from Xen? >> Yes, this is per-platform configuration. >> Easiest way is to add Fast SMC to OP-TEE that will report this >> parameter. >> Jens, what do you think about adding additional call? >> >>> In any case, I think we should update the comment to reflect that this >>> seems to be the maximum CFG_NUM_THREADS supported by any upstream >>> platform. >> >> Actually, OP-TEE protocol have possibility to handle limited number of >> threads correctly. OP-TEE can report that all threads are busy and >> client will wait for a free one. XEN can do the same, although this is not >> implemented in this patch series. But I can add this. > > Could you expand by wait? Will it block in OP-TEE/Xen or does it > return to the guest? It returns to the guest with response code OPTEE_SMC_RETURN_ETHREAD_LIMIT. Linux driver blocks calling application thread until one of the calls to OP-TEE is finished. Then driver awakens one of the blocked threads, so it can perform the call. >> >> Basically this means that all can work correctly even with >> MAX_STD_CALLS==1. It just will be not so efficient. > > Given the OS is not aware of the number of threads, The problem would > be the same without Xen between. Am I right? Exactly. > [...] > >>>> + >>>> + /* >>>> + * Command buffer should start at page boundary. >>>> + * This is OP-TEE ABI requirement. >>>> + */ >>>> + if ( call->guest_arg_ipa & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1) ) >>>> + return false; >>>> + >>>> + call->xen_arg = _xmalloc(OPTEE_MSG_NONCONTIG_PAGE_SIZE, >>>> + OPTEE_MSG_NONCONTIG_PAGE_SIZE); >>>> + if ( !call->xen_arg ) >>>> + return false; >>>> + >>>> + BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE); >>> >>> As you use _xmalloc, you should not need this. This is only necessary >>> if you use alloc_xenheap_page. >>> >> Yes, right. This is leftover from time when I mapped guest page into >> XEN. I'll move it down, to place where I map that page. >> >>> I am wondering whether it is wise to allocate the memory from xenheap >>> and not domheap. While on Arm64 (for now) xenheap and domheap are the >>> same, on Arm32 they are different. The xenheap is at most 1GB, so >>> pretty limited. >> Honestly, I have no opinion there. What are limitations of domheap in >> this case? > > domheap pages may not always be mapped in Xen page-tables. So you have > to call map_domain_page/unmap_domain_page at every use. > In practice, on Arm64, those operations are today a NOP because the > memory is always mapped. On Arm32, domheap is never mapped so those > operations will require to modify the page-tables. > There would be potentially ways to optimize the Arm32 case. So I think > this is not a big concern as it would allow to account the memory to > the domain and take advantage of potential new safety feature around > domheap. > > BTW, I am not asking to implement the accounting :). You can still > allocate domheap memory without a domain assigned. I am only giving > the advantages of using domheap over xenheap :). Actually, I considered using domheap in the beginning. But for some reason I though that domheap is limited by a total memory assigned for a domain. Looks like I was mistaken. So yes, I'll used domheap for a large allocations. [...] >>>> +{ >>>> + struct optee_msg_arg *guest_arg; >>>> + struct page_info *page; >>>> + unsigned int i; >>>> + uint32_t attr; >>>> + >>>> + /* copy_std_request() validated IPA for us */ >>> >>> Not really, the guest is free to modify the stage-2 mapping on another >>> vCPU while this is happening. I agree that the guest will shoot >>> himself, but we at least need to not have weird behavior happening. >>> >>> In that case, I would check that the type is p2m_ram_rw as you don't >>> want to write in read-only or foreign mapping. >> How I can do this atomically? I.e. what if guest will mangle p2m right >> after the check? > > What you want to prevent is Xen writing to the wrong page. The guest > should not play with page that are shared with an higher exception > level. > > get_page_from_gfn() takes a reference on the current page, that will > be release by put_page(). Between that you are sure the page can not > disappear under your feet. Ahhh, right. Thank you. > > Furthermore, AFAIK, there are no way for an Arm guest to modify the > p2m type of a page once inserted. It can only remove or replace with a > newly allocated page the mapping. If the guest instructs to > - remove the page, as you have a reference that page will not disappear. > - replace the page with a new one, then the guest will not be > able to see the result. Tough luck, but it was not meant to do that > :). > >> >>> Also, as copy_std_request() and copy_std_request_back may not be >>> called in the same "thread" it would be useful if you specify a bit >>> more the interaction. >> I not sure what do you mean there. > > What I meant is the following can happen: > > guest vCPU A | Xen > > Initiate call 1 > copy_std_request() > call OP-TEE > -> Call "preempted" > return to guest > Resume call 1 > resume call in OP-TEE > copy_std_back_request() > Yes, this is right. > AFAICT, the call could even resume from a different vCPU. This is also true. > It is not > entirely trivial to understand this from just reading the code and the > comment "copy_std_request() validated IPA for us" leads to think > copy_std_request() was called right before. This may not be the > case. So can you detail a bit more the interaction in the code? Yes, there can be an issue. My previous approach pined guest page for a call duration, so I was sure that IPA is still valid. But now this is not true anymore. So yes, this comment is misleading. I'll fix both the code and the comment. [...] >>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >>>> index 175de44927..88b48697bd 100644 >>>> --- a/xen/include/asm-arm/domain.h >>>> +++ b/xen/include/asm-arm/domain.h >>>> @@ -97,6 +97,9 @@ struct arch_domain >>>> struct vpl011 vpl011; >>>> #endif >>>> +#ifdef CONFIG_TEE >>>> + void *tee; >>>> +#endif >>> >>> Did you look whether there are any hole in arch_domain that could be >>> re-used? >> I thought about that. But what are chances to find 64bit-wide, >> 64bit-aligned hole in a structure? If I remember C standard correctly, >> there are no reasons for compiler to leave such holes. > > It depends on the alignment requested for each structure. Have a look > at pahole to see the number (and size) of holes we have in some > structures ;). Wow, 108 bytes hole in rcu_ctrlblk :) And yes, only two 8 byte holes in the whole xen. -- Best regards, Volodymyr Babchuk _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |