[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Proposal for Porting Xen to Armv8-R64 - DraftB
Hi Stefano, On 2022/4/21 5:08, Stefano Stabellini wrote: On Wed, 20 Apr 2022, Wei Chen wrote:On Tue, 19 Apr 2022, Wei Chen wrote:### 3.2. Xen Event Channel Support In Current RFC patches we haven't enabled the event channelsupport.But I think it's good opportunity to do some discussion inadvanced.On Armv8-R, all VMs are native direct-map, because there is nostage2MMU translation. Current event channel implementation depends onsomeshared pages between Xen and guest: `shared_info` and per-cpu`vcpu_info`.For `shared_info`, in current implementation, Xen will allocateapagefrom heap for `shared_info` to store initial meta data. Whenguestistrying to setup `shared_info`, it will allocate a free gfn anduse ahypercall to setup P2M mapping between gfn and `shared_info`. For direct-mapping VM, this will break the direct-mappingconcept.And on an MPU based system, like Armv8-R system, this operationwillbe very unfriendly. Xen need to pop `shared_info` page from Xenheapand insert it to VM P2M pages. If this page is in the middle of Xen heap, this means Xen need to split current heap and useextraMPU regions. Also for the P2M part, this page is unlikely toforma new continuous memory region with the existing p2m pages, andXenis likely to need another additional MPU region to set it up,whichis obviously a waste for limited MPU regions. And This kind ofdynamicis quite hard to imagine on an MPU system.Yeah, it doesn't make any sense for MPU systemsFor `vcpu_info`, in current implementation, Xen will store`vcpu_info`meta data for all vCPUs in `shared_info`. When guest is tryingtosetup`vcpu_info`, it will allocate memory for `vcpu_info` from guestside.And then guest will use hypercall to copy meta data from`shared_info`to guest page. After that both Xen `vcpu_info` and guest`vcpu_info`are pointed to the same page that allocated by guest. This implementation has serval benifits: 1. There is no waste memory. No extra memory will be allocatedfromXen heap.2. There is no P2M remap. This will not break the direct-mapping,andis MPU system friendly. So, on Armv8-R system, we can still keep current implementationforper-cpu `vcpu_info`. So, our proposal is that, can we reuse current implementationideaof`vcpu_info` for `shared_info`? We still allocate one page for `d->shared_info` at domain construction for holding some initialmeta-data,using alloc_domheap_pages instead of alloc_xenheap_pages and share_xen_page_with_guest. And when guest allocates a page for `shared_info` and use hypercall to setup it, We copy theinitialdata from`d->shared_info` to it. And after copy we can update `d- shared_info` to point to guest allocated 'shared_info' page. In this case, we don'thaveto thinkabout the fragmentation of Xen heap and p2m and the extra MPUregions. Yes, I think that would work. Also I think it should be possible to get rid of the initial d->shared_info allocation in Xen, given that d->shared_info is for the benefit of the guest and the guest cannot access it until it makes the XENMAPSPACE_shared_info hypercall.While we're working on event channel PoC work on Xen Armv8-R, we found another issue after we dropped d->shared_info allocation in Xen. Both shared_info and vcpu_info are allocated from Guest in runtime. That means the addresses of shared_info and vcpu_info are random. For MMU system, this is OK, because Xen has a full view of system memory in runtime. But for MPU system, the situation becomes a little tricky. We have to setup extra MPU regions for remote domains' shared_info and vcpu_info in event channel hypercall runtime. That's because in current Xen hypercall concept, hypercall will not cause vCPU context switch. When hypercall trap to EL2, it will keep vCPU's P2M view. For MMU system, we have vttbr_el2 for vCPU P2M view and ttbr_el2 for Xen view. So in EL2 Xen has full permissions to access any memory it wants. But for MPU system, we only have one EL2 MPU. Before entering guest, Xen will setup vCPU P2M view in EL2 MPU. In this case, when system entry EL2 through hypercall, the EL2 MPU still keeps current vCPU P2M view and with Xen essential memory (code, data, heap) access permissions. But current EL2 MPU doesn't have the access permissions for EL2 to access other domain's memory. For an event channel hypercall, if we want to update the pending bitmap in remote domain's vcpu_info, it will cause a dataabort in EL2. To solve this dataabort, we may have two methods: 1. Map remote domain's whole memory or pages for shared_info + vcpu_info in EL2 MPU temporarily for hypercall to update pending bits or other accesses. This method doesn't need to do context switch for EL2 MPU, But this method has some disadvantages: 1. We have to reserve MPU regions for hypercall. 2. Different hypercall may have different reservation of MPU regions. 3. We have to handle hypercall one by one for existed and new in future. 2. Switch to Xen's memory view in EL2 MPU when trap from EL1 to EL2. In this case, Xen will have full memory access permissions to update pending bits in EL2. This only changes the EL2 MPU context, does not need to do vCPU context switch. Because the trapped vCPU will be used in the full flow of hypercall. After the hypercall, before returning to EL2, the EL2 MPU will switch to scheduled vCPU' P2M view. This method needs to do EL2 MPU context switch, but: 1. We don't need to reserve MPU regions for Xen's memory view. (Xen's memory view has been setup while initialization) 2. We don't need to handle pages' mapping in hypercall level. 3. Apply to other EL1 to EL2 traps, like dataabort, IRQ, etc.Both approach 1) and 2) are acceptable and in fact I think we'll probably have to do a combination of both. We don't need to do a full MPU context switch every time we enter Xen. We can be flexible. Only when Xen needs to access another guest memory, if the memory is not mappable using approach 1), Xen could do a full MPU context switch. Basically, try 1) first, if it is not possible, do 2). This also solves the problem of "other hypercalls". We can always do 2) if we cannot do 1). So do we need to do 1) at all? It really depends on performance data. Not all hypercalls are made equal. Some are very rare and it is fine if they are slow. Some hypercalls are actually on the hot path. The event channels hypercalls are on the hot path so they need to be fast. It makes sense to implement 1) just for event channels hypercalls if the MPU context switch is slow. Data would help a lot here to make a good decision. Specifically, how much more expensive is an EL2 MPU context switch compared to add/remove of an MPU region in nanosec or cpu cycles?We will do it when we get a proper platform.The other aspect is how many extra MPU regions do we need for each guest to implement 1). Do we need one extra MPU region for each domU? If so, I don't think approach 1) if feasible unless we come up with a smart memory allocation scheme for shared_info and vcpu_info. For instance, if shared_info and vcpu_info of all guests were part of the Xen data or heap region, or 1 other special MPU region, then they could become immediately accessible without need for extra mappings when switching to EL2.Allocate shared_info and vcpu_info from Xen data or heap will cause memory fragmentation. We have to split the Xen data or heap and populate the pages for shared_info and vcpu_info, And insert them to Guest P2M. Because Armv8-R MPU doesn't allow memory overlap, this will cause at least 2 extra MPU regions usage. One page could not exist in Xen MPU region and Guest P2M MPU region at the same time. And we definitely don't want to make the entire Xen data and heap accessible to EL1. And this approach does not solve the 100% direct mapping problem. A special MPU region might have the same issues. Except we make this special MPU region can be accessed in EL1 and EL2 at runtime (it's unsafe), and update hypercall to use pages from this special region for shared_info and vcpu_info (every guest can see this region, so it's still 1:1 mapping). For 1), the concern is caused by our current rough PoC, we used extra MPU regions to map the whole memory of remote domain, whose may have serval memory blocks in the worst case. We have thought it further, we can reduce the map granularity to page. For example, Xen wants to update shared_info or vcpu_info, Xen must know the address of it. So we can just map this one page temporarily. So I think only reserve 1 MPU region for runtime mapping is feasible on most platforms.Actually I think that it would be great if we can do that. It looks like the best way forward.But the additional problem with this is that if the hypercall are modifying multiple variables, Xen may need to do multiple mappings if they are not on the same page (or a proper MPU region range).There are not that many hypercalls that require Xen to map multiple pages, and those might be OK if they are slow. Ok, I will update it in Draft-C.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |