[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: Proposal for Porting Xen to Armv8-R64 - DraftC (Archive for Day1)
Hi Stefano, > -----Original Message----- > From: Stefano Stabellini <sstabellini@xxxxxxxxxx> > Sent: 2022年7月29日 4:14 > To: Wei Chen <Wei.Chen@xxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; julien@xxxxxxx; Stefano Stabellini > <sstabellini@xxxxxxxxxx>; Bertrand Marquis <Bertrand.Marquis@xxxxxxx>; > Penny Zheng <Penny.Zheng@xxxxxxx> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftC (Archive for > Day1) > > Hi Wei, > > I am really sorry I didn't manage to read this proposal before now. Too > many things :-) > It doesn't matter : ) > Is it still valid? Do you want me to read it in the next days, or do you > have another update you would rather send first? > Yes it's still valid, our day1 patch hasn't been sent because we are want some pre-series from Henry to be merged first. If you can give more comments, we can also make changes for day 1 patches and update the proposal. Cheers, Wei Chen > Cheers, > > Stefano > > > On Sun, 8 May 2022, Wei Chen wrote: > > # Proposal for Porting Xen to Armv8-R64 > > > > (Update to Draft-C as a discussion archive for Day1 patches) > > > > This proposal will introduce the PoC work of porting Xen to Armv8-R64, > > which includes: > > - The changes of current Xen capability, like Xen build system, memory > > management, domain management, vCPU context switch. > > - The expanded Xen capability, like static-allocation and direct-map. > > > > ***Notes:*** > > 1. ***This proposal only covers the work of porting Xen to Armv8-R64*** > > ***single CPU.Xen SMP support on Armv8-R64 relates to Armv8-R*** > > ***Trusted-Frimware (TF-R). This is an external dependency,*** > > ***so we think the discussion of Xen SMP support on Armv8-R64*** > > ***should be started when single-CPU support is complete.*** > > 2. ***This proposal will not touch xen-tools. In current stange,*** > > ***Xen on Armv8-R64 only support dom0less, all guests should*** > > ***be booted from device tree.*** > > > > ## Changelogs > > Draft-B -> Draft-C: > > 1. Update the event-channel support section to use runtime > > MPU mapping instead of whole MPU context switch. > > 2. Split two stages for Armv8-R64 Xen start address solution. > > 3. Remove "malicious tools". > > 4. Remove GCC version bump. > > > > Draft-A -> Draft-B: > > 1. Update Kconfig options usage. > > 2. Update the section for XEN_START_ADDRESS. > > 3. Add description of MPU initialization before parsing device tree. > > 4. Remove CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS. > > 5. Update the description of ioremap_nocache/cache. > > 6. Update about the free_init_memory on Armv8-R. > > 7. Describe why we need to switch the MPU configuration later. > > 8. Add alternative proposal in TODO. > > 9. Add use tool to generate Xen Armv8-R device tree in TODO. > > 10. Add Xen PIC/PIE discussion in TODO. > > 11. Add Xen event channel support in TODO. > > > > ## Contributors: > > Wei Chen <Wei.Chen@xxxxxxx> > > Penny Zheng <Penny.Zheng@xxxxxxx> > > > > ## 1. Essential Background > > > > ### 1.1. Armv8-R64 Profile > > The Armv-R architecture profile was designed to support use cases that > > have a high sensitivity to deterministic execution. (e.g. Fuel Injection, > > Brake control, Drive trains, Motor control etc) > > > > Arm announced Armv8-R in 2013, it is the latest generation Arm > architecture > > targeted at the Real-time profile. It introduces virtualization at the > highest > > security level while retaining the Protected Memory System Architecture > (PMSA) > > based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex- > R82, > > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64. > > > > - The latest Armv8-R64 document can be found here: > > [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R > AArch64 architecture > profile](https://developer.arm.com/documentation/ddi0600/latest/). > > > > - Armv-R Architecture progression: > > Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64 > > The following figure is a simple comparison of "R" processors based on > > different Armv-R Architectures. > > ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8i > mBpbvIr2eqBguEB) > > > > - The Armv8-R architecture evolved additional features on top of Armv7-R: > > - An exception model that is compatible with the Armv8-A model > > - Virtualization with support for guest operating systems > > - PMSA virtualization using MPUs In EL2. > > - The new features of Armv8-R64 architecture > > - Adds support for the 64-bit A64 instruction set, previously Armv8- > R > > only supported A32. > > - Supports up to 48-bit physical addressing, previously up to 32-bit > > addressing was supported. > > - Optional Arm Neon technology and Advanced SIMD > > - Supports three Exception Levels (ELs) > > - Secure EL2 - The Highest Privilege, MPU only, for firmware, > hypervisor > > - Secure EL1 - RichOS (MMU) or RTOS (MPU) > > - Secure EL0 - Application Workloads > > - Optionally supports Virtual Memory System Architecture at S-EL1/S- > EL0. > > This means it's possible to run rich OS kernels - like Linux - > either > > bare-metal or as a guest. > > - Differences with the Armv8-A AArch64 architecture > > - Supports only a single Security state - Secure. There is not Non- > Secure > > execution state supported. > > - EL3 is not supported, EL2 is mandatory. This means secure EL2 is > the > > highest EL. > > - Supports the A64 ISA instruction > > - With a small set of well-defined differences > > - Provides a PMSA (Protected Memory System Architecture) based > > virtualization model. > > - As opposed to Armv8-A AArch64's VMSA based Virtualization > > - Can support address bits up to 52 if FEAT_LPA is enabled, > > otherwise 48 bits. > > - Determines the access permissions and memory attributes of > > the target PA. > > - Can implement PMSAv8-64 at EL1 and EL2 > > - Address translation flat-maps the VA to the PA for EL2 > Stage 1. > > - Address translation flat-maps the VA to the PA for EL1 > Stage 1. > > - Address translation flat-maps the IPA to the PA for EL1 > Stage 2. > > - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable. > > > > ### 1.2. Xen Challenges with PMSA Virtualization > > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run > > with an MPU and host multiple guest OSes. > > > > - No MMU at EL2: > > - No EL2 Stage 1 address translation > > - Xen provides fixed ARM64 virtual memory layout as basis of EL2 > > stage 1 address translation, which is not applicable on MPU > system, > > where there is no virtual addressing. As a result, any > operation > > involving transition from PA to VA, like ioremap, needs > modification > > on MPU system. > > - Xen's run-time addresses are the same as the link time addresses. > > - Enable PIC/PIE (position-independent code) on a real-time > target > > processor probably very rare. Further discussion in 2.1 and > TODO > > sections. > > - Xen will need to use the EL2 MPU memory region descriptors to > manage > > access permissions and attributes for accesses made by VMs at > EL1/0. > > - Xen currently relies on MMU EL1 stage 2 table to manage these > > accesses. > > - No MMU Stage 2 translation at EL1: > > - A guest doesn't have an independent guest physical address space > > - A guest can not reuse the current Intermediate Physical Address > > memory layout > > - A guest uses physical addresses to access memory and devices > > - The MPU at EL2 manages EL1 stage 2 access permissions and > attributes > > - There are a limited number of MPU protection regions at both EL2 and > EL1: > > - Architecturally, the maximum number of protection regions is 256, > > typical implementations have 32. > > - By contrast, Xen does not need to consider the number of page > table > > entries in theory when using MMU. > > - The MPU protection regions at EL2 need to be shared between the > hypervisor > > and the guest stage 2. > > - Requires careful consideration - may impact feature 'fullness' of > both > > the hypervisor and the guest > > - By contrast, when using MMU, Xen has standalone P2M table for > guest > > stage 2 accesses. > > > > ## 2. Proposed changes of Xen > > ### **2.1. Changes of build system:** > > > > - ***Introduce new Kconfig options for Armv8-R64***: > > Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not > > expect one Xen binary to run on all machines. Xen images are not > common > > across Armv8-R64 platforms. Xen must be re-built for different Armv8- > R64 > > platforms. Because these platforms may have different memory layout > and > > link address. > > - `ARM64_V8R`: > > This option enables Armv8-R profile for Arm64. Enabling this > option > > results in selecting MPU. This Kconfig option is used to gate some > > Armv8-R64 specific code except MPU code, like some code for Armv8- > R64 > > only system ID registers access. > > > > - `ARM_MPU` > > This option enables MPU on Armv8-R architecture. Enabling this > option > > results in disabling MMU. This Kconfig option is used to gate some > > ARM_MPU specific code. Once when this Kconfig option has been > enabled, > > the MMU relate code will not be built for Armv8-R64. The reason > why > > not depends on runtime detection to select MMU or MPU is that, we > don't > > think we can use one image for both Armv8-R64 and Armv8-A64. > Another > > reason that we separate MPU and V8R in provision to allow to > support MPU > > on 32bit Arm one day. > > > > ***Try to use `if ( IS_ENABLED(CONFIG_ARMXXXX) )` instead of > spreading*** > > ***`#ifdef CONFIG_ARMXXXX` everywhere, if it is possible.*** > > > > - ***About Xen start address for Armv8-R64***: > > On Armv8-A, Xen has a fixed virtual start address (link address too) > on all > > Armv8-A platforms. In an MMU based system, Xen can map its loaded > address > > to this virtual start address. On Armv8-A platforms, the Xen start > address > > does not need to be configurable. But on Armv8-R platforms, they don't > have > > MMU to map loaded address to a fixed virtual address. And different > platforms > > will have very different address space layout, so it's impossible for > Xen to > > specify a fixed physical address for all Armv8-R platforms' start > address. > > > > - Stage 1, introduce `XEN_START_ADDRESS` > > This option allows to set the custom address at which Xen will be > > linked. This address must be aligned to a page size. Xen's run-time > > addresses are the same as the link time addresses. > > ***Notes: Fixed link address means the Xen binary could not be*** > > ***relocated by EFI loader. So in current stage, Xen could not*** > > ***be launched as an EFI application on Armv8-R64.(TODO#3.3)*** > > > > - Provided by platform files. > > We can reuse the existed arm/platforms store platform specific > files. > > And `XEN_START_ADDRESS` is one kind of platform specific > information. > > So we can use platform file to define default `XEN_START_ADDRESS` > for > > each platform. > > > > - Provided by Kconfig. > > This option can be a supplymental option. Users can define a > customized > > `XEN_START_ADDRESS` to override the default value in platform's > file. > > > > - Stage 2, generated `XEN_START_ADDRESS` from device tree by build > scripts > > Vendors who want to enable Xen on their Armv8-R platforms, they > can > > use some tools/scripts to parse their boards device tree to > generate > > the basic platform information, like `XEN_START_ADDRESS`. These > > tools/scripts do not necessarily need to be integrated in Xen, but > Xen > > can give some recommended configuration. For example, Xen can > recommend > > Armv8-R platforms to use lowest ram start address + 2MB as the > default > > Xen start address. The generated platform files can be placed to > > arm/platforms for maintenance. > > > > - ***About MPU initialization before parsing device tree***: > > Before Xen can start parsing information from device tree and use > > this information to setup MPU, Xen need an initial MPU state. This > > is because: > > 1. More deterministic: Arm MPU supports background regions, if we > > don't configure the MPU regions and don't enable MPU. The > default > > MPU background attributes will take effect. The default > background > > attributes are `IMPLEMENTATION DEFINED`. That means all RAM > regions > > may be configured to device memory and RWX. Random values in > RAM or > > maliciously embedded data can be exploited. > > 2. More compatible: On some Armv8-R64 platforms, if MPU is > disabled, > > the `dc zva` instruction will make the system halt (This is one > > side effect of MPU background attributes, the RAM has been > configured > > as device memory). And this instruction will be embedded in > some > > built-in functions, like `memory set`. If we use `- > ddont_use_dc` to > > rebuild GCC, the built-in functions will not contain `dc zva`. > > However, it is obviously unlikely that we will be able to > recompile > > all GCC for ARMv8-R64. > > > > - Stage 1, reuse `XEN_START_ADDRESS` > > In the very beginning of Xen boot, Xen just need to cover a > limited > > memory range and very few devices (actually only UART device). So > we > > can use two MPU regions to map: > > 1. `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or. > > `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_size`as > > normal memory. > > 2. `UART` MMIO region base to `UART` MMIO region end to device > memory. > > These two are enough to support Xen run in boot time. And we don't > need > > to provide additional platform information for initial normal > memory > > and device memory regions. In current PoC we have used this option > > for implementation, and it's the same as Armv8-A. > > > > - Stage 2, generate information for initial MPU state from device > tree > > Introduce some macros to allow users to set initial normal > > memory regions: > > `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END` > > and device memory: > > `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END` > > These macros are the same platform specific information as > > `XEN_START_ADDRESS`, so the script of generating > `XEN_START_ADDRESS` > > also can be applied to these macros. > > > > - ***Define new system registers for compiliers***: > > Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4 > > specific system registers. As Armv8-R64 only have secure state, so > > at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the > > first GCC version that supports Armv8.4 is GCC 8.1. In addition to > > these, PMSA of Armv8-R64 introduced lots of MPU related system > registers: > > `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and > > `MPUIR_ELx`. But the first GCC version to support these system > registers > > is GCC 11. We don't want to bump GCC version to 11 in the first stage, > > it will affect some makefile scripts of common and other architectures. > > > > Instead, we will: > > - Encode new system registers in macros > > ``` > > /* Virtualization Secure Translation Control Register */ > > #define VSTCR_EL2 S3_4_C2_C6_2 > > /* Virtualization System Control Register */ > > #define VSCTLR_EL2 S3_4_C2_C0_0 > > /* EL1 MPU Protection Region Base Address Register encode */ > > #define PRBAR_EL1 S3_0_C6_C8_0 > > ... > > /* EL2 MPU Protection Region Base Address Register encode */ > > #define PRBAR_EL2 S3_4_C6_C8_0 > > ... > > ``` > > If we encode all above system registers, we don't need to bump GCC > > version. And the common CFLAGS Xen is using still can be applied to > > Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS. > > ***Notes:*** > > ***Armv8-R AArch64 supports the A64 ISA instruction set with*** > > ***some modifications:*** > > ***Redefines DMB, DSB, and adds an DFB. But actually, the*** > > ***encodings of DMB and DSB are still the same with A64.*** > > ***And DFB is an alias of DSB #12. In this case, we think*** > > ***we don't need a new architecture specific flag to*** > > ***generate new instructions for Armv8-R.*** > > > > ### **2.2. Changes of the initialization process** > > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent > > initialization process. In addition to some architecutre differences, > there > > is no more than reusable code that we will distinguish through > CONFIG_ARM_MPU > > or CONFIG_ARM64_V8R. We want most of the initialization code to be > reusable > > between Armv8-R64 and Armv8-A64. > > > > - We will reuse the original head.s and setup.c of Arm. But replace the > > MMU and page table operations in these files with configuration > operations > > for MPU and MPU regions. > > > > - We provide a boot-time MPU configuration. This MPU configuration will > > support Xen to finish its initialization. And this boot-time MPU > > configuration will record the memory regions that will be parsed from > > device tree. > > > > In the end of Xen initialization, we will use a runtime MPU > configuration > > to replace boot-time MPU configuration. The runtime MPU configuration > will > > merge and reorder memory regions to save more MPU regions for guests. > > ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRD > oacQVTwUtWIGU) > > > > - Defer system unpausing domain after free_init_memory. > > When Xen initialization is about to end, Xen unpauses guests created > > during initialization. But this will cause some issues. The unpause > > action occurs before free_init_memory, however the runtime MPU > > configuration is built after free_init_memory. In Draft-A, we had > > discussed whether a zeroing operation for init code and data is > > enough or not. Because I had just given a security reason for doing > > free_init_memory on Armv8-R (free_init_memory will drop the Xen init > > code & data, this will reduce the code an attacker can exploit). > > But I forgot other very important reasons: > > 1. Init code and data will occupy two MPU regions, because they > > have different memory attributes. > > 2. It's not easy to zero init code section, because it's readonly. > > We have to update its MPU region to make this section RW. This > > operation doesn't do much less than free_init_memory. > > 3. Zeroing init code and data will not release the two MPU regions > > they are using. This would be a very big waste of a limited MPU > > regions resource. > > 4. Current free_init_memory operation is reusing lots of Armv8-A > > codes, except re-add init memory to Xen heap. Because we're using > > static heap on Armv8-R. > > > > So if the unpaused guests start executing the context switch at this > > point, then its MPU context will base on the boot-time MPU > configuration. > > Probably it will be inconsistent with runtime MPU configuration, this > > will cause unexpected problems (This may not happen in a single core > > system, but on SMP systems, this problem is foreseeable, so we hope to > > solve it at the beginning). > > > > Why we need to switch the MPU configuration that late? > > Because we need to re-order the MPU regions to reduce complexity of > runtime > > MPU regions management. > > 1. In the boot stage, we allocate MPU regions in sequence until the > max. > > Since a few MPU regions will get removed along the way, they will > leave > > holes there. For example, when heap is ready, fdt will be > reallocated > > in the heap, which means the MPU region for device tree is never > needed. > > And also in free_init_memory, although we do not add init memory to > heap, > > we still reclaim the MPU regions they are using. Without ordering, > we > > may need a bitmap to record such information. > > > > In context switch, the memory layout is quite different for guest > mode > > and hypervisor mode. When switching to guest mode, only guest RAM, > > emulated/passthrough devices, etc could be seen, but in hypervisor > mode, > > all Xen used devices and guests RAM shall be seen. And without > reordering, > > we need to iterate all MPU regions to find according regions to > disable > > during runtime context switch, that's definitely a overhead. > > > > So we propose an ordering at the tail of the boot time, to put all > fixed > > MPU regions in the head, like xen text/data, etc, and put all > flexible > > ones at tail, like device memory, guests RAM. > > > > Then later in runtime, like context switch, we could easily just > disable > > ones from tail and inserts new ones in the tail. > > > > ### **2.3. Changes to reduce memory fragmentation** > > > > In general, memory in Xen system can be classified to 4 classes: > > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest > Kernel, > > initrd and dtb)` > > > > Currently, Xen doesn't have any restriction for users how to allocate > > memory for different classes. That means users can place boot modules > > anywhere, can reserve Xen heap memory anywhere and can allocate guest > > memory anywhere. > > > > In a VMSA system, this would not be too much of a problem, since the > > MMU can manage memory at a granularity of 4KB after all. But in a > > PMSA system, this will be a big problem. On Armv8-R64, the max MPU > > protection regions number has been limited to 256. But in typical > > processor implementations, few processors will design more than 32 > > MPU protection regions. Add in the fact that Xen shares MPU protection > > regions with guest's EL1 Stage 2. It becomes even more important > > to properly plan the use of MPU protection regions. > > > > - An ideal of memory usage layout restriction: > > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAt > d75XtrngcnW) > > 1. Reserve proper MPU regions for Xen image (code, rodata and data + > bss). > > 2. Reserve one MPU region for boot modules. > > That means the placement of all boot modules, include guest kernel, > > initrd and dtb, will be limited to this MPU region protected area. > > 3. Reserve one or more MPU regions for Xen heap. > > On Armv8-R64, the guest memory is predefined in device tree, it will > > not be allocated from heap. Unlike Armv8-A64, we will not move all > > free memory to heap. We want Xen heap is deterministic too, so Xen on > > Armv8-R64 also rely on Xen static heap feature. The memory for Xen > > heap will be defined in tree too. Considering that physical memory > > can also be discontinuous, one or more MPU protection regions needs > > to be reserved for Xen HEAP. > > 4. If we name above used MPU protection regions PART_A, and name left > > MPU protection regions PART_B: > > 4.1. In hypervisor context, Xen will map left RAM and devices to > PART_B. > > This will give Xen the ability to access whole memory. > > 4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B. > > In this case, Xen just need to update PART_B in context switch, > > but keep PART_A as fixed. > > > > ***Notes: Static allocation will be mandatory on MPU based systems*** > > > > **A sample device tree of memory layout restriction**: > > ``` > > chosen { > > ... > > /* > > * Define a section to place boot modules, > > * all boot modules must be placed in this section. > > */ > > mpu,boot-module-section = <0x10000000 0x10000000>; > > /* > > * Define a section to cover all guest RAM. All guest RAM must be > located > > * within this section. The pros is that, in best case, we can only > have > > * one MPU protection region to map all guest RAM for Xen. > > */ > > mpu,guest-memory-section = <0x20000000 0x30000000>; > > /* > > * Define a memory section that can cover all device memory that > > * will be used in Xen. > > */ > > mpu,device-memory-section = <0x80000000 0x7ffff000>; > > /* Define a section for Xen heap */ > > xen,static-mem = <0x50000000 0x20000000>; > > > > domU1 { > > ... > > #xen,static-mem-address-cells = <0x01>; > > #xen,static-mem-size-cells = <0x01>; > > /* Statically allocated guest memory, within mpu,guest-memory- > section */ > > xen,static-mem = <0x30000000 0x1f000000>; > > > > module@11000000 { > > compatible = "multiboot,kernel\0multiboot,module"; > > /* Boot module address, within mpu,boot-module-section */ > > reg = <0x11000000 0x3000000>; > > ... > > }; > > > > module@10FF0000 { > > compatible = "multiboot,device-tree\0multiboot,module"; > > /* Boot module address, within mpu,boot-module-section > */ > > reg = <0x10ff0000 0x10000>; > > ... > > }; > > }; > > }; > > ``` > > It's little hard for users to compose such a device tree by hand. Based > > on the discussion of Draft-A, Xen community suggested users to use some > > tools like [imagebuilder](https://gitlab.com/xen-project/imagebuilder) > to generate the above device tree properties. > > Please goto TODO#3.3 section to get more details of this suggestion. > > > > ### **2.4. Changes of memory management** > > Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to > > decouple Xen from VMSA. And give Xen the ability to manage memory in > PMSA. > > > > 1. ***Use buddy allocator to manage physical pages for PMSA*** > > From the view of physical page, PMSA and VMSA don't have any > difference. > > So we can reuse buddy allocator on Armv8-R64 to manage physical pages. > > The difference is that, in VMSA, Xen will map allocated pages to > virtual > > addresses. But in PMSA, Xen just convert the pages to physical > address. > > > > 2. ***Can not use virtual address for memory management*** > > As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using > virtual > > address to manage memory. This brings some problems, some virtual > address > > based features could not work well on Armv8-R64, like `FIXMAP`, > `vmap/vumap`, > > `ioremap` and `alternative`. > > > > But the functions or macros of these features are used in lots of > common > > code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate > code > > everywhere. In this case, we propose to use stub helpers to make the > changes > > transparently to common code. > > 1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap > operations. > > This will return physical address directly of fixmapped item. > > 2. For `vmap/vumap`, we will use some empty inline stub helpers: > > ``` > > static inline void vm_init_type(...) {} > > static inline void *__vmap(...) > > { > > return NULL; > > } > > static inline void vunmap(const void *va) {} > > static inline void *vmalloc(size_t size) > > { > > return NULL; > > } > > static inline void *vmalloc_xen(size_t size) > > { > > return NULL; > > } > > static inline void vfree(void *va) {} > > ``` > > > > 3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to > always > > return `NULL`, they could not work well on Armv8-R64 without > changes. > > `ioremap` will return input address directly. But if some extended > > functions like `ioremap_nocache`, `ioremap_cache`, need to ask a > new > > memory attributes. As Armv8-R doesn't have infinite MPU regions > for > > Xen to split the memory area from its located MPU region and > assign > > the new attributes to it. So in `ioremap_nocache`, `ioremap_cache`, > > if the input attributes are different from current memory > attributes, > > these functions will return `NULL`. > > ``` > > static inline void *ioremap_attr(...) > > { > > /* We don't have the ability to change input PA cache > attributes */ > > if ( CACHE_ATTR_need_change ) > > return NULL; > > return (void *)pa; > > } > > static inline void __iomem *ioremap_nocache(...) > > { > > return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE); > > } > > static inline void __iomem *ioremap_cache(...) > > { > > return ioremap_attr(start, len, PAGE_HYPERVISOR); > > } > > static inline void __iomem *ioremap_wc(...) > > { > > return ioremap_attr(start, len, PAGE_HYPERVISOR_WC); > > } > > void *ioremap(...) > > { > > return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE); > > } > > > > ``` > > 4. For `alternative`, it has been listed in TODO, we will simply > disable > > it on Armv8-R64 in current stage. But simply disable > `alternative` > > will make `cpus_have_const_cap` always return false. > > ``` > > * System capability check for constant cap */ > > #define cpus_have_const_cap(num) ({ \ > > register_t __ret; \ > > \ > > asm volatile (ALTERNATIVE("mov %0, #0", \ > > "mov %0, #1", \ > > num) \ > > : "=r" (__ret)); \ > > \ > > unlikely(__ret); \ > > }) > > ``` > > So, before we have an PMSA `alternative` implementation, we have > to > > implement a separate `cpus_have_const_cap` for Armv8-R64: > > ``` > > #define cpus_have_const_cap(num) cpus_have_cap(num) > > ``` > > > > ### **2.5. Changes of guest management** > > Armv8-R64 only supports PMSA in EL2, but it supports configurable > > VMSA or PMSA in EL1. This means Xen will have a new type guest on > > Armv8-R64 - MPU based guest. > > > > 1. **Add a new domain type - MPU_DOMAIN** > > When user want to create a guest that will be using MPU in EL1, user > > should add a `mpu` property in device tree `domU` node, like > following > > example: > > ``` > > domU2 { > > compatible = "xen,domain"; > > direct-map; > > mpu; --> Indicates this domain will use PMSA in EL1. > > ... > > }; > > ``` > > Corresponding to `mpu` property in device tree, we also need to > introduce > > a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself > as an > > MPU domain. This flag will be used in domain creation and domain > doing > > vCPU context switch. > > 1. Domain creation need this flag to decide enable PMSA or VMSA in > EL1. > > 2. vCPU context switch need this flag to decide save/restore MMU or > MPU > > related registers. > > > > 2. **Add MPU registers for vCPU to save EL1 MPU context** > > Current Xen only supports MMU based guest, so it hasn't considered to > > save/restore MPU context. In this case, we need to add MPU registers > > to `arch_vcpu`: > > ``` > > struct arch_vcpu > > { > > ... > > #ifdef CONFIG_ARM_MPU > > /* Virtualization Translation Control Register */ > > register_t vtcr_el2; > > > > /* EL1 MPU regions' registers */ > > pr_t *mpu_regions; > > #endif > > ... > > } > > ``` > > Armv8-R64 can support max to 256 MPU regions. But that's just > theoretical. > > So we don't want to embed `pr_t mpu_regions[256]` in `arch_vcpu` > directly, > > this will be a memory waste in most cases. Instead we use a pointer > in > > `arch_vcpu` to link with a dynamically allocated `mpu_regions`: > > ``` > > p->arch.mpu_regions = _xzalloc(sizeof(pr_t) * mpu_regions_count_el1, > SMP_CACHE_BYTES); > > ``` > > As `arch_vcpu` is used very frequently in context switch, so Xen > defines > > `arch_vcpu` as a cache alignment data structure. `mpu_regions` also > will > > be used very frequently in Armv8-R context switch. So we use > `_xzalloc` > > to allocate `SMP_CACHE_BYTES` alignment memory for `mpu_regions`. > > > > `mpu_regions_count_el1` can be detected from `MPUIR_EL1` system > register > > in Xen boot stage. The limitation is that, if we define a static > > `arch_vcpu`, we have to allocate `mpu_regions` before using it. > > > > 3. **MPU based P2M table management** > > Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But > through > > PMSA, it still has the ability to control the permissions and > attributes > > of EL1 stage 2. In this case, we still hope to keep the interface > > consistent with MMU based P2M as far as possible. > > > > p2m->root will point to an allocated memory. In Armv8-A64, this > memory > > is used to save the EL1 stage 2 translation table. But in Armv8-R64, > > this memory will be used to store EL2 MPU protection regions that are > > used by guest. During domain creation, Xen will prepare the data in > > this memory to make guest can access proper RAM and devices. When the > > guest's vCPU will be scheduled in, this data will be written to MPU > > protection region registers. > > > > ### **2.6. Changes of exception trap** > > As Armv8-R64 has compatible exception mode with Armv8-A64, so we can > reuse most > > of Armv8-A64's exception trap & handler code. But except the trap based > on EL1 > > stage 2 translation abort. > > > > In Armv8-A64, we use `FSC_FLT_TRANS` > > ``` > > case FSC_FLT_TRANS: > > ... > > if ( is_data ) > > { > > enum io_state state = try_handle_mmio(regs, hsr, gpa); > > ... > > } > > ``` > > But for Armv8-R64, we have to use `FSC_FLT_PERM` > > ``` > > case FSC_FLT_PERM: > > ... > > if ( is_data ) > > { > > enum io_state state = try_handle_mmio(regs, hsr, gpa); > > ... > > } > > ``` > > > > ### **2.5. Changes of device driver** > > Because Armv8-R64 only has single secure state, this will affect some > > devices that have two secure state, like GIC. But fortunately, most > > vendors will not link a two secure state GIC to Armv8-R64 processors. > > Current GIC driver can work well with single secure state GIC for Armv8- > R64. > > > > ### **2.7. Changes of virtual device** > > Currently, we only support pass-through devices in guest. Because event > > channel, xen-bus, xen-storage and other advanced Xen features haven't > been > > enabled in Armv8-R64. > > > > ## 3. TODO > > This section describes some features that are not currently implemented > in > > the PoC. Those features are things that should be looked in a second > stage > > and will not be part of the initial support of MPU/Armv8-R. Those jobs > could > > be done by Arm or any Xen contributors. > > > > ### 3.1. Alternative framework support > > On Armv8-A system, `alternative` is depending on `VMAP` function to > remap > > a code section to a new read/write virtual address. But on Armv8-R, > we do > > not have virtual address to do remap. So as an alternative method, > we will > > disable the MPU to make all RAM `RWX` in "apply alternative all > patches" > > progress temporarily. > > > > 1. Disable MPU -> Code section becomes RWX. > > 2. Apply alternative patches to Xen text. > > 3. Enable MPU -> Code section restores to RX. > > > > All memory is RWX, there may be some security risk. But, because > > "alternative apply patches" happens in Xen init stage, it probably > > doesn't matter as much. > > > > ### 3.2. Xen Event Channel Support > > In Current RFC patches we haven't enabled the event channel support. > > But I think it's good opportunity to do some discussion in advanced. > > On Armv8-R, all VMs are native direct-map, because there is no > stage2 > > MMU translation. Current event channel implementation depends on > some > > shared pages between Xen and guest: `shared_info` and per-cpu > `vcpu_info`. > > > > There are two issues with these two pages: > > > > 3.2.1. Direct-mapping: > > For `shared_info`, in current implementation, Xen will allocate a > page > > from heap for `shared_info` to store initial meta data. When guest > is > > trying to setup `shared_info`, it will allocate a free gfn and use a > > hypercall to setup P2M mapping between gfn and `shared_info`. > > > > For direct-mapping VM, this will break the direct-mapping concept. > > And on an MPU based system, like Armv8-R system, this operation will > > be very unfriendly. Xen need to pop `shared_info` page from Xen heap > > and insert it to VM P2M pages. If this page is in the middle of > > Xen heap, this means Xen need to split current heap and use extra > > MPU regions. Also for the P2M part, this page is unlikely to form > > a new continuous memory region with the existing p2m pages, and Xen > > is likely to need another additional MPU region to set it up, which > > is obviously a waste for limited MPU regions. And This kind of > dynamic > > is quite hard to imagine on an MPU system. > > > > For `vcpu_info`, in current implementation, Xen will store > `vcpu_info` > > meta data for all vCPUs in `shared_info`. When guest is trying to > setup > > `vcpu_info`, it will allocate memory for `vcpu_info` from guest side. > > And then guest will use hypercall to copy meta data from > `shared_info` > > to guest page. After that both Xen `vcpu_info` and guest `vcpu_info` > > are pointed to the same page that allocated by guest. > > > > This implementation has serval benefits: > > 1. There is no waste memory. No extra memory will be allocated from > Xen heap. > > 2. There is no P2M remap. This will not break the direct-mapping, > and > > is MPU system friendly. > > So, on Armv8-R system, we can still keep current implementation for > > per-cpu `vcpu_info`. > > > > So, our proposal is that, can we reuse current implementation idea > of > > `vcpu_info` for `shared_info`? We still allocate one page for > > `d->shared_info` at domain construction for holding some initial > meta-data, > > using alloc_domheap_pages instead of alloc_xenheap_pages and > > share_xen_page_with_guest. And when guest allocates a page for > > `shared_info` and use hypercall to setup it, We copy the initial > data from > > `d->shared_info` to it. And after copy we can update `d- > >shared_info` to point > > to guest allocated 'shared_info' page. In this case, we don't have > to think > > about the fragmentation of Xen heap and p2m and the extra MPU > regions. > > > > As guest cannot access 'shared_info' until it makes the > > XENMAPSPACE_shared_info hypercall. So it should be possible to get > rid > > of the initial 'shared_info' allocation in Xen. > > > > 3.2.2. How to access these pages of remote domain in hypercall > > As 'shared_info' and 'vcpu_info' are allocated by Guest. And these > pages > > are not mapped in Xen's MPU regions, instead they should be mapped > in > > guest's P2M MPU regions. When guest issues a hypercall to notify a > peer > > domain through event channel. Xen needs to update the pending bitmap > > in peer domain's page. > > > > For MMU system, Xen has a full view of system memory in runtime. > Because > > it has dedicated EL2 MMU to map whole system memory. So it has the > ability > > to update peer domain's pending bitmap. But in MPU system, the EL2 > MPU is > > shared by Xen and guest P2M mapping. In hypercall context, the EL2 > MPU > > only contains Xen memory (code, data and heap) and current running > guest's > > P2M mapping. When Xen accesses peer domain's pending bitmap, it will > cause > > EL2 data abort. > > > > So, for MPU system, we need to reserve one EL2 MPU region to map > peer > > domain's page to handle hypercalls which need to access this peer > domain's > > page. More detailed discussions have been listed in [Draft- > B](https://lists.xenproject.org/archives/html/xen-devel/2022- > 04/msg01719.html). > > > > > > But here still has some concerns: > > `d->shared_info` in Xen is accessed without any lock. So it will not > be > > that simple to update `d->shared_info`. It might be possible to > protect > > d->shared_info (or other structure) with a read-write lock. > > > > Do we need to add PGT_xxx flags to make it global and stay as much > the > > same with the original op, a simple investigation tells us that it > only > > be referred in `get_page_type`. Since ARM doesn't care about type > counts > > and always return 1, it doesn't have too much impact. > > > > ### 3.3. Xen Partial PIC/PIE > > We have mentioned about PIC/PIE in section 1.2. With PIC/PIE support, > > Xen can be loaded at any address and run properly. But it's rare to > use > > PIC/PIE on a real-time system (code size, more memory access). So a > > partial PIC/PIE image maybe better. But partial PIC/PIE image may > not > > solve the Xen start address issue. > > > > But a partial PIC/PIE support may be needed for Armv8-R. Because Arm > > [EBBR](https://arm-software.github.io/ebbr/index.html) require Xen > > on Armv8-R to support EFI boot service. Due to lack of relocation > > capability, EFI loader could not launch xen.efi on Armv8-R. So maybe > > we still need a partially supported PIC/PIE. Only some boot code > > support PIC/PIE to make EFI relocation happy. This boot code will > > help Xen to check its loaded address and relocate Xen image to Xen's > > run-time address if need. > > > > ### 3.4. A tool to generate Armv8-R Xen device tree > > 1. Use a tool to generate above device tree property. > > This tool will have some similar inputs as below: > > --- > > DEVICE_TREE="fvp_baremetal.dtb" > > XEN="4.16-2022.1/xen" > > > > NUM_DOMUS=1 > > DOMU_KERNEL[0]="4.16-2022.1/Image-domU" > > DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio" > > DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-dev.dtb" > > DOMU_RAM_BASE[0]=0x30000000 > > DOMU_RAM_SIZE[0]=0x1f000000 > > --- > > Using above inputs, the tool can generate a device tree similar as > > we have described in sample. > > > > - `mpu,guest-memory-section`: > > This section will cover all guests' RAM (`xen,static-mem` defined > regions > > in all DomU nodes). All guest RAM must be located within this section. > > In the best case, we can only have one MPU protection region to map > all > > guests' RAM for Xen. > > > > If users set `DOMU_RAM_BASE`, `DOMU_RAM_SIZE` or > `DOMU_STATIC_MEM_RANGES`, > > these will be converted to the base and size of `xen,static-mem`. > This tool > > will scan all `xen, static-mem` in DomU nodes to determine the base > and > > size of `mpu,guest-memory-section`. If any other kind of memory has > been > > detected in this section, this tool can report an error. > > Except build time check, Xen also need to do runtime check to prevent > a > > bad device tree that generated by malicious tools. > > > > If users set `DOMU_RAM_SIZE` only, this will be converted to the size > of > > `xen,static-mem` only. Xen will allocate the guest memory in runtime, > but > > not from Xen heap. `mpu,guest-memory-section` will be calculated in > runtime > > too. The property in device tree doesn't need or will be ignored by > Xen. > > > > - `mpu,boot-module-section`: > > This section will be used to store the boot modules like DOMU_KERNEL, > > DOMU_RAMDISK, and DOMU_PASSTHROUGH_DTB. Xen keeps all boot modules in > > this section to meet the requirement of DomU restart on Armv8-R. In > > current stage, we don't have a privilege domain like Dom0 that can > > access filesystem to reload DomU images. > > > > And in current Xen code, the base and size are mandatory for boot > modules > > If users don't specify the base of each boot module, the tool will > > allocate a base for each module. And the tool will generate the > > `mpu,boot-module-section` region, when it finishes boot module memory > > allocation. > > > > Users also can specify the base and size of each boot module, these > will > > be converted to the base and size of module's `reg` directly. The > tool > > will scan all modules `reg` in DomU nodes to generate the base and > size of > > `mpu,boot-module-section`. If there is any kind of other memory usage > > has been detected in this section, this tool can report an error. > > Except build time check, Xen also need to do runtime check to prevent > a > > bad device tree. > > > > - `mpu,device-memory-section`: > > This section will cover all device memory that will be used in Xen. > Like > > `UART`, `GIC`, `SMMU` and other devices. We haven't considered > multiple > > `mpu,device-memory-section` scenarios. The devices' memory and RAM > are > > interleaving in physical address space, it would be required to use > > multiple `mpu,device-memory-section` to cover all devices. This > layout > > is common on Armv8-A system, especially in server. But it's rare in > > Armv8-R. So in current stage, we don't want to allow multiple > > `mpu,device-memory-section`. The tool can scan baremetal device tree > > to sort all devices' memory ranges. And calculate a proper region for > > `mpu,device-memory-section`. If it find Xen need multiple > > `mpu,device-memory-section`, it can report an unsupported error. > > > > 2. Use a tool to generate device tree property and platform files > > This opinion still uses the same inputs as opinion#1. But this tool > only > > generates `xen,static-mem` and `module` nodes in DomU nodes, it will > not > > generate `mpu,guest-memory-section`, `mpu,boot-module-section` and > > `mpu,device-memory-section` properties in device tree. This will > > generate following macros: > > `MPU_GUEST_MEMORY_SECTION_BASE`, `MPU_GUEST_MEMORY_SECTION_SIZE` > > `MPU_BOOT_MODULE_SECTION_BASE`, `MPU_BOOT_MODULE_SECTION_SIZE` > > `MPU_DEVICE_MEMORY_SECTION_BASE`, `MPU_DEVICE_MEMORY_SECTION_SIZE` > > in platform files in build time. In runtime, Xen will skip the device > > tree parsing for `mpu,guest-memory-section`, `mpu,boot-module- > section` > > and `mpu,device-memory-section`. And instead Xen will use these > macros > > to do runtime check. > > But, this also means these macros only exist in local build system, > > these macros will not be maintained in Xen repo. > > > > Both options are acceptable and we could support both. The main > difference for > > the user is that option #2 requires a Xen build after running > ImageBuilder, > > while option #1 might not. > > > > But we don't have to implement both options right away. We can start > from > > option#1, it will be easier for initial implementation, and the current > PoC > > implementation will still be possible to use. > > > > -- > > Cheers, > > Wei Chen > >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |