Xen project Mailing List

Re: Proposal for Porting Xen to Armv8-R64 - DraftB

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Fri, 22 Apr 2022 14:09:05 +0800

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Gu8TCVTwwBYGYf7B9H9RVveLQMMJB0IZehu9UilohK0=; b=Nmx9Ka8UJIFbDa0AKg5FYQRZbUlwA/L7649KeC/kQIEA0IR1VgLAu7yFzzqfLiaVAJ+Iy1nN7JBHfGOb/DdCS2ProbGKACz4zh6cLNsQm2jFO/iHGlbNkNJ4oBhuTo63Mo0eGCKfuUGi1MQ79KLvvcpL4fdYS3hItdl4aNR3hbrjRbbpR6KnhtVNtSXacASRmdQsHZu10FEgVWM/RVsY53W3FD8L6434ENlvXudfzvYQh2HRdCcOlPYDW6C4hk8tX/EESw0rgJJH8i4EnHe4e2aHq2YUNpHD04No6LnHlWs+BjTD1vvx5Acl0w3OL3l/I5MpLBXRmQ9dSRvVwSTmdw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cSZBl703RrcrfmuzgOSk1VYuWINjjWxEp0bk59yzB7npsmO4JPenpBE7do5GsIR/vrQtephJ4aE3t2jHKG0GPZ5YdCEdc/PPq0PwWcsO3Heo/O+eXk5zMJdKzJCcQe6SQ9qyzudmcDtl9OxLJobH3J9ue4s5Sc6ieWK8a9N4vEDs8JkA6NeUY4K6OoZgoBH+Fy2PYZK3R0aeKe3eq94qCPZepdiJd2ulfPIRmmEbdOnyfel3lPnk+pNhKq80ndEkoQJDpUvyT1TGYZGvE9aqxQlSHgqWPvmWoCkEgsYnW0oihbnSP78KGvlm+YUnlkq7bwIQxNJSwI9N5sfVby6b8A==

Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Penny Zheng <Penny.Zheng@xxxxxxx>

Delivery-date: Fri, 22 Apr 2022 06:09:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Nodisclaimer: true

Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;

Hi Stefano, On 2022/4/21 5:08, Stefano Stabellini wrote:

On Wed, 20 Apr 2022, Wei Chen wrote:

On Tue, 19 Apr 2022, Wei Chen wrote:

### 3.2. Xen Event Channel Support
     In Current RFC patches we haven't enabled the event channel

support.

     But I think it's good opportunity to do some discussion in

advanced.

     On Armv8-R, all VMs are native direct-map, because there is no

stage2

     MMU translation. Current event channel implementation depends on

some

     shared pages between Xen and guest: `shared_info` and per-cpu

`vcpu_info`.


     For `shared_info`, in current implementation, Xen will allocate

page

     from heap for `shared_info` to store initial meta data. When

guest

is

     trying to setup `shared_info`, it will allocate a free gfn and

use a

     hypercall to setup P2M mapping between gfn and `shared_info`.

     For direct-mapping VM, this will break the direct-mapping

concept.

     And on an MPU based system, like Armv8-R system, this operation

will

     be very unfriendly. Xen need to pop `shared_info` page from Xen

heap

     and insert it to VM P2M pages. If this page is in the middle of
     Xen heap, this means Xen need to split current heap and use

extra

     MPU regions. Also for the P2M part, this page is unlikely to

form

     a new continuous memory region with the existing p2m pages, and

Xen

     is likely to need another additional MPU region to set it up,

which

     is obviously a waste for limited MPU regions. And This kind of

dynamic

     is quite hard to imagine on an MPU system.


Yeah, it doesn't make any sense for MPU systems

     For `vcpu_info`, in current implementation, Xen will store

`vcpu_info`

     meta data for all vCPUs in `shared_info`. When guest is trying

to

setup

     `vcpu_info`, it will allocate memory for `vcpu_info` from guest

side.

     And then guest will use hypercall to copy meta data from

`shared_info`

     to guest page. After that both Xen `vcpu_info` and guest

`vcpu_info`

     are pointed to the same page that allocated by guest.

     This implementation has serval benifits:
     1. There is no waste memory. No extra memory will be allocated

from

Xen heap.

     2. There is no P2M remap. This will not break the direct-mapping,

and

        is MPU system friendly.
     So, on Armv8-R system, we can still keep current implementation

for

     per-cpu `vcpu_info`.

     So, our proposal is that, can we reuse current implementation

idea

of

     `vcpu_info` for `shared_info`? We still allocate one page for
     `d->shared_info` at domain construction for holding some initial

meta-data,

     using alloc_domheap_pages instead of alloc_xenheap_pages and
     share_xen_page_with_guest. And when guest allocates a page for
     `shared_info` and use hypercall to setup it,  We copy the

initial

data from

     `d->shared_info` to it. And after copy we can update `d-
shared_info` to point
     to guest allocated 'shared_info' page. In this case, we don't

have

to think

     about the fragmentation of Xen heap and p2m and the extra MPU

regions.

Yes, I think that would work.

Also I think it should be possible to get rid of the initial
d->shared_info allocation in Xen, given that d->shared_info is for the
benefit of the guest and the guest cannot access it until it makes the
XENMAPSPACE_shared_info hypercall.


While we're working on event channel PoC work on Xen Armv8-R, we found
another issue after we dropped d->shared_info allocation in Xen. Both
shared_info and vcpu_info are allocated from Guest in runtime. That
means the addresses of shared_info and vcpu_info are random. For MMU
system, this is OK, because Xen has a full view of system memory in
runtime. But for MPU system, the situation becomes a little tricky.
We have to setup extra MPU regions for remote domains' shared_info
and vcpu_info in event channel hypercall runtime. That's because
in current Xen hypercall concept, hypercall will not cause vCPU
context switch. When hypercall trap to EL2, it will keep vCPU's
P2M view. For MMU system, we have vttbr_el2 for vCPU P2M view and
ttbr_el2 for Xen view. So in EL2 Xen has full permissions to access
any memory it wants. But for MPU system, we only have one EL2 MPU.
Before entering guest, Xen will setup vCPU P2M view in EL2 MPU.
In this case, when system entry EL2 through hypercall, the EL2
MPU still keeps current vCPU P2M view and with Xen essential
memory (code, data, heap) access permissions. But current EL2 MPU
doesn't have the access permissions for EL2 to access other
domain's memory. For an event channel hypercall, if we want to
update the pending bitmap in remote domain's vcpu_info, it will
cause a dataabort in EL2. To solve this dataabort, we may have
two methods:
1. Map remote domain's whole memory or pages for shared_info +
    vcpu_info in EL2 MPU temporarily for hypercall to update
    pending bits or other accesses.

    This method doesn't need to do context switch for EL2 MPU,
    But this method has some disadvantages:
    1. We have to reserve MPU regions for hypercall.
    2. Different hypercall may have different reservation of
       MPU regions.
    3. We have to handle hypercall one by one for existed and
       new in future.

2. Switch to Xen's memory view in EL2 MPU when trap from EL1 to
    EL2. In this case, Xen will have full memory access permissions
    to update pending bits in EL2. This only changes the EL2 MPU
    context, does not need to do vCPU context switch. Because the
    trapped vCPU will be used in the full flow of hypercall. After
    the hypercall, before returning to EL2, the EL2 MPU will switch
    to scheduled vCPU' P2M view.
    This method needs to do EL2 MPU context switch, but:
    1. We don't need to reserve MPU regions for Xen's memory view.
       (Xen's memory view has been setup while initialization)
    2. We don't need to handle pages' mapping in hypercall level.
    3. Apply to other EL1 to EL2 traps, like dataabort, IRQ, etc.



Both approach 1) and 2) are acceptable and in fact I think we'll
probably have to do a combination of both.

We don't need to do a full MPU context switch every time we enter Xen.
We can be flexible. Only when Xen needs to access another guest memory,
if the memory is not mappable using approach 1), Xen could do a full MPU
context switch. Basically, try 1) first, if it is not possible, do 2).

This also solves the problem of "other hypercalls". We can always do 2)
if we cannot do 1).

So do we need to do 1) at all? It really depends on performance data.
Not all hypercalls are made equal. Some are very rare and it is fine if
they are slow. Some hypercalls are actually on the hot path. The event
channels hypercalls are on the hot path so they need to be fast. It
makes sense to implement 1) just for event channels hypercalls if the
MPU context switch is slow.

Data would help a lot here to make a good decision. Specifically, how
much more expensive is an EL2 MPU context switch compared to add/remove
of an MPU region in nanosec or cpu cycles?


We will do it when we get a proper platform.


The other aspect is how many extra MPU regions do we need for each guest
to implement 1). Do we need one extra MPU region for each domU? If so, I
don't think approach 1) if feasible unless we come up with a smart
memory allocation scheme for shared_info and vcpu_info. For instance, if
shared_info and vcpu_info of all guests were part of the Xen data or
heap region, or 1 other special MPU region, then they could become
immediately accessible without need for extra mappings when switching to
EL2.


Allocate shared_info and vcpu_info from Xen data or heap will cause memory
fragmentation. We have to split the Xen data or heap and populate the pages
for shared_info and vcpu_info, And insert them to Guest P2M. Because Armv8-R
MPU doesn't allow memory overlap, this will cause at least 2 extra MPU
regions usage. One page could not exist in Xen MPU region and Guest P2M
MPU region at the same time. And we definitely don't want to make the entire
Xen data and heap accessible to EL1. And this approach does not solve the
100% direct mapping problem. A special MPU region might have the same issues.
Except we make this special MPU region can be accessed in EL1 and EL2 at
runtime (it's unsafe), and update hypercall to use pages from this special
region for shared_info and vcpu_info (every guest can see this region, so
it's still 1:1 mapping).

For 1), the concern is caused by our current rough PoC, we used extra MPU
regions to map the whole memory of remote domain, whose may have serval
memory blocks in the worst case. We have thought it further, we can reduce
the map granularity to page. For example, Xen wants to update shared_info
or vcpu_info, Xen must know the address of it. So we can just map this
one page temporarily. So I think only reserve 1 MPU region for runtime
mapping is feasible on most platforms.


Actually I think that it would be great if we can do that. It looks like
the best way forward.

But the additional problem with this is that if the hypercall are
modifying multiple variables, Xen may need to do multiple mappings if
they are not on the same page (or a proper MPU region range).


There are not that many hypercalls that require Xen to map multiple
pages, and those might be OK if they are slow.

Ok, I will update it in Draft-C.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.