[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Proposal for Porting Xen to Armv8-R64 - DraftA


  • To: Wei Chen <Wei.Chen@xxxxxxx>, Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Penny Zheng <Penny.Zheng@xxxxxxx>
  • Date: Wed, 2 Mar 2022 07:21:39 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Tw+q6GjRMtOYcdH4Q4SF3TG4OIRNlxJCmmtJJHK5DlA=; b=MvG6UdvAAeAcFm9U01pa/DJK9fbuNfODtkBBlp/2ObXVXsTaVaJeRA+A+M9+TLmEJHKlgklkZjY3dYG3ZfB6xsRYj+7nWrVliSG7KfauuYxg3H9LBbd2JqfBA3CL9WIJHRbU30GTkobkjKXv4Gu6uhWt2UWEUSwAiZ3VBcAyaGUnADKWitx8MITRek1q37lwVf3cWCxIDjm5AA69FgXUcQas5BoYjr/yULxLHOrdXgbQaaudnAM6hBW0EYttSk+dHXpb6waoOC7H/H2ghsM9mnqwrqWaM6iBBn9APa3QAM54U5rf+B99srjKWRKtfCg8P+5TrlIW+Jk6z3O3idfRVQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bhLVn0mVZwjEOUW2c0SFsjfd8fUjKEXKB9JkU8ST7HDhl0j8gwBC2GbDTFNKacesh+A9sGpyL4iZNtD0nWmGRbkCsrRnXiaFyKdrpCbX529K12lPooeqh3gmdgiczB09WGuDpv+JTTNGdK3VKONXjvPKYb+02RDTfS75ZxAbwTYswE3BvIHgFq7AzHiobsEjVZ4yjP1ILJ/u2lRzIrOMlxVZwS7LW5vpEp7MG5OeyR/C9LnoHY9ZH45hZZHxUmW+lmadZQrJDMyTLN7j96UGrUQYiBAmUQ2x/fHykMJaNHufYIcJ/OMBQNajA9izgLOsRizcLLIGCBVzK9Es5RzdoA==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, nd <nd@xxxxxxx>
  • Delivery-date: Wed, 02 Mar 2022 07:22:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AdgpQxtXwh7LkfydTgiYk9bhMgU+ogBRt1sAAKse4VAAMaCKcA==
  • Thread-topic: Proposal for Porting Xen to Armv8-R64 - DraftA

Hi julien

> -----Original Message-----
> From: Wei Chen <Wei.Chen@xxxxxxx>
> Sent: Tuesday, March 1, 2022 3:52 PM
> To: Julien Grall <julien@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Stefano
> Stabellini <sstabellini@xxxxxxxxxx>
> Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>; Penny Zheng
> <Penny.Zheng@xxxxxxx>; Henry Wang <Henry.Wang@xxxxxxx>; nd
> <nd@xxxxxxx>
> Subject: RE: Proposal for Porting Xen to Armv8-R64 - DraftA
> 
> Hi Julien,
> 
> > -----Original Message-----
> > From: Julien Grall <julien@xxxxxxx>
> > Sent: 2022年2月26日 4:55
> > To: Wei Chen <Wei.Chen@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> > Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>; Penny Zheng
> > <Penny.Zheng@xxxxxxx>; Henry Wang <Henry.Wang@xxxxxxx>; nd
> > <nd@xxxxxxx>
> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA
> >
> > Hi Wei,
> >
> > Thank you for sending the proposal. Please find some comments below.
> >
> > On 24/02/2022 06:01, Wei Chen wrote:
> > > # Proposal for Porting Xen to Armv8-R64
> > >
> > > This proposal will introduce the PoC work of porting Xen to
> > > Armv8-R64, which includes:
> > > - The changes of current Xen capability, like Xen build system, memory
> > >    management, domain management, vCPU context switch.
> > > - The expanded Xen capability, like static-allocation and direct-map.
> > >
> > > ***Notes:***
> > > 1. ***This proposal only covers the work of porting Xen to Armv8-R64***
> > >     ***single CPU. Xen SMP support on Armv8-R64 relates to Armv8-R***
> > >     ***Trusted-Frimware (TF-R). This is an external dependency,***
> > >     ***so we think the discussion of Xen SMP support on Armv8-R64***
> > >     ***should be started when single-CPU support is complete.***
> >
> > I agree that we should first focus on single-CPU support.
> >
> 
> ack.
> 
> > > 2. ***This proposal will not touch xen-tools. In current stage,***
> > >     ***Xen on Armv8-R64 only support dom0less, all guests should***
> > >     ***be booted from device tree.***
> >
> > Make sense. I actually expect some issues in the way xen-tools would
> > need to access memory of the domain that is been created.
> >
> 
> Yes, we also feel that changes to xen-tools could be a big job in the future
> (both xen common implementation and tools need changes).
> 
> > [...]
> >
> > > ### 1.2. Xen Challenges with PMSA Virtualization Xen is PMSA unaware
> > > Type-1 Hypervisor, it will need modifications to run with an MPU and
> > > host multiple guest OSes.
> > >
> > > - No MMU at EL2:
> > >      - No EL2 Stage 1 address translation
> > >          - Xen provides fixed ARM64 virtual memory layout as basis
> > > of
> > EL2
> > >            stage 1 address translation, which is not applicable on
> > > MPU
> > system,
> > >            where there is no virtual addressing. As a result, any
> > operation
> > >            involving transition from PA to VA, like ioremap, needs
> > modification
> > >            on MPU system.
> > >      - Xen's run-time addresses are the same as the link time addresses.
> > >          - Enable PIC (position-independent code) on a real-time target
> > >            processor probably very rare.
> >
> > Aside the assembly boot code and UEFI stub, Xen already runs at the
> > same address as it was linked.
> >
> 
> But the difference is that, base on MMU, we can use the same link address
> for all platforms. But on MPU system, we can't do it in the same way.
> 
> > >      - Xen will need to use the EL2 MPU memory region descriptors to
> > manage
> > >        access permissions and attributes for accesses made by VMs at
> > EL1/0.
> > >          - Xen currently relies on MMU EL1 stage 2 table to manage these
> > >            accesses.
> > > - No MMU Stage 2 translation at EL1:
> > >      - A guest doesn't have an independent guest physical address space
> > >      - A guest can not reuse the current Intermediate Physical Address
> > >        memory layout
> > >      - A guest uses physical addresses to access memory and devices
> > >      - The MPU at EL2 manages EL1 stage 2 access permissions and
> > attributes
> > > - There are a limited number of MPU protection regions at both EL2
> > > and
> > EL1:
> > >      - Architecturally, the maximum number of protection regions is 256,
> > >        typical implementations have 32.
> > >      - By contrast, Xen does not need to consider the number of page
> > table
> > >        entries in theory when using MMU.
> > > - The MPU protection regions at EL2 need to be shared between the
> > hypervisor
> > >    and the guest stage 2.
> > >      - Requires careful consideration - may impact feature
> > > 'fullness' of
> > both
> > >        the hypervisor and the guest
> > >      - By contrast, when using MMU, Xen has standalone P2M table for
> > guest
> > >        stage 2 accesses.
> >
> > [...]
> >
> > > - ***Define new system registers for compilers***:
> > >    Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4
> > >    specific system registers. As Armv8-R64 only have secure state, so
> > >    at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the
> > >    first GCC version that supports Armv8.4 is GCC 8.1. In addition to
> > >    these, PMSA of Armv8-R64 introduced lots of MPU related system
> > registers:
> > >    `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx`
> and
> > >    `MPUIR_ELx`. But the first GCC version to support these system
> > registers
> > >    is GCC 11. So we have two ways to make compilers to work properly
> > with
> > >    these system registers.
> > >    1. Bump GCC version to GCC 11.
> > >       The pros of this method is that, we don't need to encode these
> > >       system registers in macros by ourselves. But the cons are that,
> > >       we have to update Makefiles to support GCC 11 for Armv8-R64.
> > >       1.1. Check the GCC version 11 for Armv8-R64.
> > >       1.2. Add march=armv8r to CFLAGS for Armv8-R64.
> > >       1.3. Solve the confliction of march=armv8r and mcpu=generic
> > >      These changes will affect common Makefiles, not only Arm Makefiles.
> > >      And GCC 11 is new, lots of toolchains and Distro haven't
> > > supported
> > it.
> >
> > I agree that forcing to use GCC11 is not a good idea. But I am not
> > sure to understand the problem with the -march=.... Ultimately,
> > shouldn't we aim to build Xen ARMv8-R with -march=armv8r?
> >
> 
> Actually, we had done, but we reverted it from RFC patch series. The reason
> has been listed above. But that is not the major reason. The main reason is
> that:
> Armv8-R AArch64 supports the A64 ISA instruction set with some
> modifications:
> Redefines DMB, DSB, and adds an DFB. But actually, the encodings of DMB
> and DSB are still the same with A64. And DFB is a alias of DSB #12.
> 
> In this case, we don't think we need a new arch flag to generate new
> instructions for Armv8-R. And we have discussed with Arm kernel guys, they
> will not update the build system to build Linux that will be running on
> Armv8-R64 EL1 either.
> 
> 
> > [...]
> >
> > > ### **2.2. Changes of the initialization process** In general, we
> > > still expect Armv8-R64 and Armv8-A64 to have a consistent
> > > initialization process. In addition to some architecture
> > > differences,
> > there
> > > is no more than reusable code that we will distinguish through
> > CONFIG_ARM_MPU
> > > or CONFIG_ARM64_V8R. We want most of the initialization code to be
> > reusable
> > > between Armv8-R64 and Armv8-A64.
> > >
> > > - We will reuse the original head.s and setup.c of Arm. But replace the
> > >    MMU and page table operations in these files with configuration
> > operations
> > >    for MPU and MPU regions.
> > >
> > > - We provide a boot-time MPU configuration. This MPU configuration will
> > >    support Xen to finish its initialization. And this boot-time MPU
> > >    configuration will record the memory regions that will be parsed from
> > >    device tree.
> > >
> > >    In the end of Xen initialization, we will use a runtime MPU
> > configuration
> > >    to replace boot-time MPU configuration. The runtime MPU
> > > configuration
> > will
> > >    merge and reorder memory regions to save more MPU regions for
> guests.
> > >
> > > ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1P
> q
> > > R
> > DoacQVTwUtWIGU)
> > >
> > > - Defer system unpausing domain.
> > >    When Xen initialization is about to end, Xen unpause guests created
> > >    during initialization. But this will cause some issues. The unpause
> > >    action occurs before free_init_memory, however the runtime MPU
> > configuration
> > >    is built after free_init_memory.
> >
> > I was half expecting that free_init_memory() would not be called for
> > Xen Armv8R.
> >
> 
> We had called free_init_memory for Xen Armv8R, but it doesn't really mean
> much. As we have static heap, so we don't reclaim init memory to heap. And
> this reclaimed memory could not be used by Xen data and bss either. But
> from the security perspective, free_init_memory will drop the Xen init code &
> data, this will reduce the code an attacker can exploit.
> 
> > >
> > >    So if the unpaused guests start executing the context switch at this
> > >    point, then its MPU context will base on the boot-time MPU
> > configuration.
> >
> > Can you explain why you want to switch the MPU configuration that late?
> >
> 

It is more related to the implementation.

In the boot stage, we allocate MPU regions in sequence until the max. 
Since a few MPU region will get removed along the way, it leaves hole there.
Such like when heap is ready, fdt will be reallocated in the heap, which means 
the
MPU region for device tree is in no need. And also in free_init_memory, 
although we
do not give back init memory to the heap, we will also destroy according MPU
regions to make them inaccessible.
Without ordering, we need a bitmap to record such information.

In context switch, the memory layout is quite different for guest mode and
hypervisor mode. When switching to guest mode, only guest RAM, 
emulated/passthrough
devices, etc could be seen, but in hypervisor mode, all guests RAM and device 
memory
shall be seen. And without reordering, we need to iterate all MPU regions to 
find
according regions to disable during runtime context switch, that's definitely a 
overhead.

So we propose an ordering at the tail of the boot time, to put all fixed MPU 
region
in the head, like xen text/data, etc, and put all flexible ones at tail, like 
device memory,
guests RAM.
Then later in context switch,  we could easily just disable ones from tail and 
inserts new
ones in the tail.    

> In the boot stage, Xen is the only user of MPU. It may add some memory
> nodes or device memory to MPU regions for temporary usage. After free init
> memory, we want to reclaim these MPU regions to give more MPU regions
> can be used for guests. Also we will do some merge and reorder work. This
> work can make MPU regions to be easier managed in guest context switch.
> 
> > >    Probably it will be inconsistent with runtime MPU configuration, this
> > >    will cause unexpected problems (This may not happen in a single core
> > >    system, but on SMP systems, this problem is foreseeable, so we
> > > hope
> > to
> > >    solve it at the beginning).
> >
> > [...]
> >
> > > ### **2.4. Changes of memory management** Xen is coupled with VMSA,
> > > in order to port Xen to Armv8-R64, we have to decouple Xen from
> > > VMSA. And give Xen the ability to manage memory in
> > PMSA.
> > >
> > > 1. ***Use buddy allocator to manage physical pages for PMSA***
> > >     From the view of physical page, PMSA and VMSA don't have any
> > difference.
> > >     So we can reuse buddy allocator on Armv8-R64 to manage physical
> > pages.
> > >     The difference is that, in VMSA, Xen will map allocated pages to
> > virtual
> > >     addresses. But in PMSA, Xen just convert the pages to physical
> > address.
> > >
> > > 2. ***Can not use virtual address for memory management***
> > >     As Armv8-R64 only has PMSA in EL2, Xen loses the ability of
> > > using
> > virtual
> > >     address to manage memory. This brings some problems, some
> > > virtual
> > address
> > >     based features could not work well on Armv8-R64, like `FIXMAP`,
> > `vmap/vumap`,
> > >     `ioremap` and `alternative`.
> > >
> > >     But the functions or macros of these features are used in lots
> > > of
> > common
> > >     code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate
> > > relate
> > code
> > >     everywhere. In this case, we propose to use stub helpers to make
> > > the
> > changes
> > >     transparently to common code.
> > >     1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap
> > operations.
> > >        This will return physical address directly of fixmapped item.
> > >     2. For `vmap/vumap`, we will use some empty inline stub helpers:
> > >          ```
> > >          static inline void vm_init_type(...) {}
> > >          static inline void *__vmap(...)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void vunmap(const void *va) {}
> > >          static inline void *vmalloc(size_t size)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void *vmalloc_xen(size_t size)
> > >          {
> > >              return NULL;
> > >          }
> > >          static inline void vfree(void *va) {}
> > >          ```
> > >
> > >     3. For `ioremap`, it depends on `vmap`. As we have make `vmap`
> > > to
> > always
> > >        return `NULL`, they could not work well on Armv8-R64 without
> > changes.
> > >        `ioremap` will return input address directly.
> > >          ```
> > >          static inline void *ioremap_attr(...)
> > >          {
> > >              /* We don't have the ability to change input PA cache
> > attributes */
> > OOI, who will set them?
> 
> Some callers that want to change a memory's attribute will set them.
> Something like ioremap_nocache. I am not sure is this what you had asked : )
> 
> >
> > >              if ( CACHE_ATTR_need_change )
> > >                  return NULL;
> > >              return (void *)pa;
> > >          }
> > >          static inline void __iomem *ioremap_nocache(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
> > >          }
> > >          static inline void __iomem *ioremap_cache(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR);
> > >          }
> > >          static inline void __iomem *ioremap_wc(...)
> > >          {
> > >              return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
> > >          }
> > >          void *ioremap(...)
> > >          {
> > >              return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
> > >          }
> > >
> > >          ```
> > >      4. For `alternative`, it depends on `vmap` too.
> >
> > The only reason we depend on vmap() is because the map the sections
> > *text read-only and we enforce WnX. For VMSA, it would be possible to
> > avoid vmap() with some rework. I don't know for PMSA.
> >
> 
> For PMSA, we still enforce WnX. For your use case, I assume it's alternative.
> It still may have some possibility to avoid vmap(). But there may be some
> security issues. We had thought to disable MPU -> update xen text -> enable
> MPU to copy VMSA alternative's behavior. The problem with this, however,
> is that at some point, all memory is RWX. There maybe some security risk.
> But because it's in init stage, it probably doesn't matter as much as I 
> thought.
> 

In MMU system, we use vmap() to change requested xen text codes(a few lines) 
temporarily
to RW to apply the alternative codes, the granularity for it could be 4KB.

But on MPU system, we give the whole XEN text code a MPU region, so otherwise 
we disable
the whole MPU to make it happen, which leads to a little risk for running c 
codes where MPU
disabled, or all text memory becoming RWX at this alternative time.
 
> > > We will simply disable
> > >         it on Armv8-R64 in current stage. How to implement `alternative`
> > >         on Armv8-R64 is better to be discussed after basic functions
> > > of
> > Xen
> > >         on Armv8-R64 work well.
> > alternative are mostly helpful to handle errata or enable features
> > that are not present on all CPUs. I wouldn't expect this to be
> > necessary at the beginning. In fact, on Arm, it was introduced > 4
> > years after the initial port :).
> 
> I hope it won't take us so long, this time : )
> 
> >
> > [...]
> >
> > > ### **2.5. Changes of device driver** 1. Because Armv8-R64 only has
> > > single secure state, this will affect some devices that have two
> > > secure state, like GIC. But fortunately, most vendors will not link
> > > a two secure state GIC to Armv8-R64 processors.
> > > Current GIC driver can work well with single secure state GIC for
> > > Armv8-
> > R64.
> > > 2. Xen should use secure hypervisor timer in Secure EL2. We will
> > introduce
> > > a CONFIG_ARM_SECURE_STATE to make Xen to use secure registers for
> timer.
> > >
> > > ### **2.7. Changes of virtual device** Currently, we only support
> > > pass-through devices in guest. Because event channel, xen-bus,
> > > xen-storage and other advanced Xen features haven't
> > been
> > > enabled in Armv8-R64.
> >
> > That's fine. I expect to require quite a bit of work to move from Xen
> > sharing the pages (e.g. like for grant-tables) to the guest sharing pages.
> >
> 
> Yes.
> 
> > Cheers,
> >
> > --
> > Julien Grall


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.