[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ping: [PATCH v4 1/2] docs/designs/launch: Hyperlaunch design document



On Thu, May 13, 2021 at 8:41 PM Christopher Clark
<christopher.w.clark@xxxxxxxxx> wrote:
>
> From: "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
>
> Adds a design document for Hyperlaunch, formerly DomB mode of dom0less.
>
> Signed-off-by: Christopher Clark <christopher.clark@xxxxxxxxxx>
> Signed-off by: Daniel P. Smith <dpsmith@xxxxxxxxxxxxxxxxxxxx>
> Reviewed-by: Rich Persaud <rp@xxxxxxxxxxxxxx>
>
> ---
> Changes since v3:
> * Rename the Landscape table
> * Changed Crash Domain to Recovery Domain
>   * amended text to indicate that this will be new rather than existing Xen
>     functionality
>   * including update to the configuration, permission, function table
> * Add definitions for “recovery domain” and “crash environment”, describing
>   the different functionalities
>   * some design issues deferred
> * Added section to explain the motivations for the separation between VM
>   creation (by the hypervisor) and VM configuration (by the boot domain)
> * Adjusted the description of the current process for creating a domain
> * Added recommendation for UEFI boot to use GRUB.efi to load via multiboot2
>   method.
> * Added Document Structure section
> * Added section on Communication of Domain Configuration
>
>  docs/designs/launch/hyperlaunch.rst | 1004 +++++++++++++++++++++++++++
>  1 file changed, 1004 insertions(+)
>  create mode 100644 docs/designs/launch/hyperlaunch.rst
>
> diff --git a/docs/designs/launch/hyperlaunch.rst 
> b/docs/designs/launch/hyperlaunch.rst
> new file mode 100644
> index 0000000000..30fce8c9c3
> --- /dev/null
> +++ b/docs/designs/launch/hyperlaunch.rst
> @@ -0,0 +1,1004 @@
> +###########################
> +Hyperlaunch Design Document
> +###########################
> +
> +.. sectnum:: :depth: 4
> +
> +This post is a Request for Comment on the included v4 of a design document 
> that
> +describes Hyperlaunch: a new method of launching the Xen hypervisor, relating
> +to dom0less and work from the Hyperlaunch project. We invite discussion of 
> this
> +on this list, at the monthly Xen Community Calls, and at dedicated meetings 
> on
> +this topic in the Xen Working Group which will be announced in advance on the
> +Xen Development mailing list.
> +
> +
> +.. contents:: :depth: 3
> +
> +
> +Introduction
> +============
> +
> +This document describes the design and motivation for the funded development 
> of
> +a new, flexible system for launching the Xen hypervisor and virtual machines
> +named: "Hyperlaunch".
> +
> +The design enables seamless transition for existing systems that require a
> +dom0, and provides a new general capability to build and launch alternative
> +configurations of virtual machines, including support for static partitioning
> +and accelerated start of VMs during host boot, while adhering to the 
> principles
> +of least privilege. It incorporates the existing dom0less functionality,
> +extended to fold in the new developments from the Hyperlaunch project, with
> +support for both x86 and Arm platform architectures, building upon and
> +replacing the earlier 'late hardware domain' feature for disaggregation of
> +dom0.
> +
> +Hyperlaunch is designed to be flexible and reusable across multiple use 
> cases,
> +and our aim is to ensure that it is capable, widely exercised, 
> comprehensively
> +tested, and well understood by the Xen community.
> +
> +Document Structure
> +==================
> +
> +This is the primary design document for Hyperlaunch, to provide an overview 
> of
> +the feature. Separate additional documents will cover specific aspects of
> +Hyperlaunch in further detail, including:
> +
> +  - The Device Tree specification for Hyperlaunch metadata
> +  - New Domain Roles for Xen and the Xen Security Modules (XSM) policy
> +  - Passthrough of PCI devices with Hyperlaunch
> +
> +Approach
> +========
> +
> +Born out of improving support for Dynamic Root of Trust for Measurement 
> (DRTM),
> +the Hyperlaunch project is focused on restructuring the system launch of Xen.
> +The Hyperlaunch design provides a security architecture that builds on the
> +principles of Least Privilege and Strong Isolation, achieving this through 
> the
> +disaggregation of system functions. It enables this with the introduction of 
> a
> +boot domain that works in conjunction with the hypervisor to provide the
> +ability to launch multiple domains as part of host boot while maintaining a
> +least privilege implementation.
> +
> +While the Hyperlaunch project inception was and continues to be driven by a
> +focus on security through disaggregation, there are multiple use cases with a
> +non-security focus that require or benefit from the ability to launch 
> multiple
> +domains at host boot. This was proven by the need that drove the 
> implementation
> +of the dom0less capability in the Arm branch of Xen.
> +
> +Hyperlaunch is designed to be flexible and reusable across multiple use 
> cases,
> +and our aim is to ensure that it is capable, widely exercised, 
> comprehensively
> +tested, and provides a robust foundation for current and emerging system 
> launch
> +requirements of the Xen community.
> +
> +
> +Objectives
> +----------
> +
> +* In general strive to maintain compatibility with existing Xen behavior
> +* A default build of the hypervisor should be capable of booting both 
> legacy-compatible and new styles of launch:
> +
> +        * classic Xen boot: starting a single, privileged Dom0
> +        * classic Xen boot with late hardware domain: starting a Dom0 that 
> transitions hardware access/control to another domain
> +        * a dom0less boot: starting multiple domains without privilege 
> assignment controls
> +        * Hyperlaunch: starting one or more VMs, with flexible configuration
> +
> +* Preferred that it be managed via KCONFIG options to govern inclusion of 
> support for each style
> +* The selection between classic boot and Hyperlaunch boot should be automatic
> +
> +        * Preferred that it not require a kernel command line parameter for 
> selection
> +
> +* It should not require modification to boot loaders
> +* It should provide a user friendly interface for its configuration and 
> management
> +* It must provide a method for building systems that fallback to console 
> access in the event of misconfiguration
> +* It should be able to boot an x86 Xen environment without the need for a 
> Dom0 domain
> +
> +
> +Requirements and Design
> +=======================
> +
> +Hyperlaunch is defined as the ability of a hypervisor to construct and start
> +one or more virtual machines at system launch in a specific way. A hypervisor
> +can support one or both modes of configuration, Hyperlaunch Static and
> +Hyperlaunch Dynamic. The Hyperlaunch Static mode functions as a static
> +partitioning hypervisor ensuring only the virtual machines started at system
> +launch are running on the system. The Hyperlaunch Dynamic mode functions as a
> +dynamic hypervisor allowing for additional virtual machines to be started 
> after
> +the initial virtual machines have started. The Xen hypervisor is capable of
> +both modes of configuration from the same binary and when paired with its XSM
> +flask, provides strong controls that enable fine grained system partitioning.
> +
> +Hypervisor Launch Landscape
> +---------
> +
> +This comparison table presents the distinctive capabilities of Hyperlaunch 
> with
> +reference to existing launch configurations currently available in Xen and
> +other hypervisors.
> +
> +::
> +
> + 
> +---------------+-----------+------------+-----------+-------------+---------------------+
> + | **Xen Dom0**  | **Linux** | **Late**   | **Jail**  | **Xen**     | **Xen 
> Hyperlaunch** |
> + | **(Classic)** | **KVM**   | **HW Dom** | **house** | 
> **dom0less**+---------+-----------+
> + |               |           |            |           |             | Static 
>  | Dynamic   |
> + 
> +===============+===========+============+===========+=============+=========+===========+
> + | Hypervisor able to launch multiple VMs during host boot                   
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |     Y     |       Y     |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Hypervisor supports Static Partitioning                                   
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |     Y     |       Y     |    Y   
>  |           |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Able to launch VMs dynamically after host boot                            
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |       Y       |     Y     |      Y*    |     Y     |       Y*    |        
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Supports strong isolation between all VMs started at host boot            
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |     Y     |       Y     |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Enables flexible sequencing of VM start during host boot                  
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |             |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Prevent all-powerful static root domain being launched at boot            
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |       Y*    |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Operates without a Highly-privileged management VM (eg. Dom0)             
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |      Y*    |           |       Y*    |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Operates without a privileged toolstack VM (Control Domain)               
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |       Y*    |    Y   
>  |           |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Extensible VM configuration applied before launch of VMs at host boot     
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |             |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Flexible granular assignment of permissions and functions to VMs          
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |             |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | Supports extensible VM measurement architecture for DRTM and attestation  
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |             |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + | PCI passthrough configured at host boot                                   
>              |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> + |               |           |            |           |             |    Y   
>  |     Y     |
> + 
> +---------------+-----------+------------+-----------+-------------+---------+-----------+
> +
> +
> +Domain Construction
> +-------------------
> +
> +An important aspect of the Hyperlaunch architecture is that the hypervisor
> +performs domain construction for all the Initial Domains,  ie. it builds each
> +domain that is described in the Launch Control Module. More specifically, the
> +hypervisor will perform the function of *domain creation* for each Initial
> +Domain: it allocates the unique domain identifier assigned to the virtual
> +machine and records essential metadata about it in the internal data 
> structure
> +that enables scheduling the domain to run. It will also perform *basic domain
> +construction*: build the initial page tables with data from the kernel and
> +initial ramdisk supplied, and as appropriate for the domain type, populate 
> the
> +p2m table and ACPI tables.
> +
> +Subsequent to this, the boot domain can apply additional configuration to the
> +initial domains from the data in the LCM, in *extended domain construction*.
> +
> +The benefits of this structure include:
> +
> +* Security: Contrains the permissions required by the boot domain: it does 
> not
> +  require the capability to create domains in this structure. This aligns 
> with
> +  the principles of least privilege.
> +* Flexibility: Enables policy-based dynamic assignment of hardware by the 
> boot
> +  domain, customizable according to use-case and able to adapt to hardware
> +  discovery
> +* Compatibility: Supports reuse of familiar tools with use-case customized 
> boot
> +  domains.
> +* Commonality: Reuses the same logic for initial basic domain building across
> +  diverse Xen deployments.
> +       * It aligns the x86 initial domain construction with the existing Arm
> +         dom0less feature for construction of multiple domains at boot.
> +       * The boot domain implementation may vary significantly with different
> +         deployment use cases, whereas the hypervisor implementation is
> +         common.
> +* Correctness: Increases confidence in the implementation of domain
> +  construction, since it is performed by the hypervisor in well maintained 
> and
> +  centrally tested logic.
> +* Performance: Enables launch for configurations where a fast start of
> +  multiple domains at boot is a requirement.
> +* Capability: Supports launch of advanced configurations where a sequenced
> +  start of multiple domains is required, or multiple domains are involved in
> +  startup of the running system configuration
> +       * eg. for PCI passthrough on systems where the toolstack runs in a
> +         separate domain to the hardware management.
> +
> +Please, see the ‘Hyperlaunch Device Tree’ design document, which describes 
> the
> +configuration module that is provided to the hypervisor by the bootloader.
> +
> +The hypervisor determines how these domains are started as host boot 
> completes:
> +in some systems the Boot Domain acts upon the extended boot configuration
> +supplied as part of launch, performing configuration tasks for preparing the
> +other domains for the hypervisor to commence running them.
> +
> +Common Boot Configurations
> +--------------------------
> +
> +When looking across those that have expressed interest or discussed a need 
> for
> +launching multiple domains at host boot, the Hyperlaunch approach is to 
> provide
> +the means to start nearly any combination of domains. Below is an enumerated
> +selection of common boot configurations for reference in the following 
> section.
> +
> +Dynamic Launch with a Highly-Privileged Domain 0
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Hyperlaunch Classic: Dom0
> +        This configuration mimics the classic Xen start and domain 
> construction
> +        where a single domain is constructed with all privileges and 
> functions for
> +        managing hardware and running virtualization toolstack software.
> +
> +Hyperlaunch Classic: Extended Launch Dom0
> +        This configuration is where a Dom0 is started via a Boot Domain that 
> runs
> +        first. This is for cases where some preprocessing in a less 
> privileged domain
> +        is required before starting the all-privileged Domain 0.
> +
> +Hyperlaunch Classic: Basic Cloud
> +        This configuration constructs a Dom0 that is started in parallel 
> with some
> +        number of workload domains.
> +
> +Hyperlaunch Classic: Cloud
> +        This configuration builds a Dom0 and some number of workload 
> domains, launched
> +        via a Boot Domain that runs first.
> +
> +
> +Static Launch Configurations: without a Domain 0 or a Control Domain
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Hyperlaunch Static: Basic
> +        Simple static partitioning where all domains that can be run on this 
> system are
> +        built and started during host boot and where no domain is started 
> with the
> +        Control Domain permissions, thus making it not possible to 
> create/start any
> +        further new domains.
> +
> +Hyperlaunch Static: Standard
> +        This is a variation of the “Hyperlaunch Static: Basic” static 
> partitioning
> +        configuration with the introduction of a Boot Domain. This 
> configuration allows
> +        for use of a Boot Domain to be able to apply extended configuration
> +        to the Initial Domains before they are started and
> +        sequence the order in which they start.
> +
> +Hyperlaunch Static: Disaggregated
> +        This is a variation of the “Hyperlaunch Static: Standard” 
> configuration with
> +        the introduction of a Boot Domain and an illustration that some 
> functions can
> +        be disaggregated to dedicated domains.
> +
> +Dynamic Launch of Disaggregated System Configurations
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Hyperlaunch Dynamic: Hardware Domain
> +        This configuration mimics the existing Xen feature late hardware 
> domain with
> +        the one difference being that the hardware domain is constructed by 
> the
> +        hypervisor at startup instead of later by Dom0.
> +
> +Hyperlaunch Dynamic: Flexible Disaggregation
> +        This configuration is similar to the “Hyperlaunch Classic: Dom0” 
> configuration
> +        except that it includes starting a separate hardware domain during 
> Xen startup.
> +        It is also similar to “Hyperlaunch Dynamic: Hardware Domain” 
> configuration, but
> +        it launches via a Boot Domain that runs first.
> +
> +Hyperlaunch Dynamic: Full Disaggregation
> +        In this configuration it is demonstrated how it is possible to start 
> a fully
> +        disaggregated system: the virtualization toolstack runs in a Control 
> Domain,
> +        separate from the domains responsible for managing hardware, 
> XenStore, the Xen
> +        Console and Crash functions, each launched via a Boot Domain.
> +
> +
> +Example Use Cases and Configurations
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The following example use cases can be matched to configurations listed in 
> the
> +previous section.
> +
> +Use case: Modern cloud hypervisor
> +"""""""""""""""""""""""""""""""""
> +
> +**Option:** Hyperlaunch Classic: Cloud
> +
> +This configuration will support strong isolation for virtual TPM domains and
> +measured launch in support of attestation to infrastructure management, while
> +allowing the use of existing Dom0 virtualization toolstack software.
> +
> +Use case: Edge device with security or safety requirements
> +""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
> +
> +**Option:** Hyperlaunch Static: Boot
> +
> +This configuration runs without requiring a highly-privileged Dom0, and 
> enables
> +extended VM configuration to be applied to the Initial VMs prior to launching
> +them, optionally in a sequenced start.
> +
> +Use case: Client hypervisor
> +"""""""""""""""""""""""""""
> +
> +**Option:** Hyperlaunch Dynamic: Flexible Disaggregation
> +
> +**Option:** Hyperlaunch Dynamic: Full Disaggregation
> +
> +These configurations enable dynamic client workloads, strong isolation for 
> the
> +domain running the virtualization toolstack software and each domain managing
> +hardware, with PCI passthrough performed during host boot and support for
> +measured launch.
> +
> +Hyperlaunch Disaggregated Launch
> +--------------------------------
> +
> +
> +Existing in Xen today are two primary permissions, *control domain* and
> +*hardware domain*, and two functions, *console domain* and *xenstore domain*,
> +that can be assigned to a domain. Traditionally all of these permissions and
> +functions are all assigned to Dom0 at start and can then be delegated to 
> other
> +domains created by the toolstack in Dom0. With Hyperlaunch it becomes 
> possible
> +to assign these permissions and functions to any domain for which there is a
> +definition provided at startup.
> +
> +Additionally, two further functions are introduced: the *recovery domain*,
> +intended to assist with recovery from failures encountered starting VMs 
> during
> +host boot, and the *boot domain*, for performing aspects of domain 
> construction
> +during startup.
> +
> +Supporting the booting of each of the above common boot configurations is
> +accomplished by considering the set of initial domains and the assignment of
> +Xen’s permissions and functions, including the ones introduced by 
> Hyperlaunch,
> +to these domains. A discussion of these will be covered later but for now 
> they
> +are laid out in a table with a mapping to the common boot configurations. 
> This
> +table is not intended to be an exhaustive list of configurations and does not
> +account for flask policy specified functions that are use case specific.
> +
> +In the table each number represents a separate domain being
> +constructed by the Hyperlaunch construction path as Xen starts, and the
> +designator, ``{n}`` signifies that there may be “n” additional domains that 
> may
> +be constructed that do not have any special role for a general Xen system.
> +
> +::
> +
> + +-------------------+------------------+-----------------------------------+
> + | Configuration     |    Permission    |            Function               |
> + |                   +------+------+----+------+--------+--------+----------+
> + |                   | None | Ctrl | HW | Boot |Recovery| Console| Xenstore |
> + +===================+======+======+====+======+========+========+==========+
> + | Classic: Dom0     |      |  0   | 0  |      |   0    |   0    |    0     |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Classic: Extended |      |  1   | 1  |  0   |   1    |   1    |    1     |
> + | Launch Dom0       |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Classic:          | {n}  |  0   | 0  |      |   0    |   0    |    0     |
> + | Basic Cloud       |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Classic: Cloud    | {n}  |  1   | 1  |  0   |   1    |   1    |    1     |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Static: Basic     | {n}  |      | 0  |      |   0    |   0    |    0     |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Static: Standard  | {n}  |      | 1  |  0   |   1    |   1    |    1     |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Static:           | {n}  |      | 2  |  0   |   3    |   4    |    1     |
> + | Disaggregated     |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Dynamic:          |      |  0   | 1  |      |   0    |   0    |    0     |
> + | Hardware Domain   |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Dynamic: Flexible | {n}  |  1   | 2  |  0   |   1    |   1    |    1     |
> + | Disaggregation    |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> + | Dynamic: Full     | {n}  |  2   | 3  |  0   |   4    |   5    |    1     |
> + | Disaggregation    |      |      |    |      |        |        |          |
> + +-------------------+------+------+----+------+--------+--------+----------+
> +
> +Overview of Hyperlaunch Flow
> +----------------------------
> +
> +Before delving into Hyperlaunch, a good basis to start with is an 
> understanding
> +of the current process to create a domain. A way to view this process starts
> +with the core configuration which is the information the hypervisor requires 
> to
> +make the call to `domain_create`, followed by basic construction to provide 
> the
> +memory image to run, including the kernel and ramdisk. A subsequent step
> +applies the extended configuration used by the toolstack to provide a domain
> +with any additional configuration information. Until the extended 
> configuration
> +is completed, a domain has access to no resources except its allocated vcpus
> +and memory. The exception to this is Dom0, which the hypervisor explicitly
> +grants control and access to all system resources, except for those that only
> +the hypervisor should have control over.  This exception for Dom0 is driven 
> by
> +the system structure with a monolithic Dom0 domain predating introduction of
> +support for disaggregation into Xen, and the corresponding default assignment
> +of multiple roles within the Xen system to Dom0.
> +
> +While not a different domain creation path, there does exist the Hardware
> +Domain (hwdom), sometimes also referred to as late-Dom0. It is an early 
> effort
> +to disaggregate Dom0’s roles into a separate control domain and hardware
> +domain. This capability is activated by the passing of a domain id to the
> +`hardware_dom` kernel command line parameter, and the Xen hypervisor will 
> then
> +flag that domain id as the hardware domain. Later when the toolstack 
> constructs
> +a domain with that domain id as the requested domid, the hypervisor will
> +transfer all device I/O from Dom0 to this domain. In addition it will also
> +transfer the “host shutdown on domain shutdown” flag from Dom0 to the 
> hardware
> +domain. It is worth mentioning that this approach for disaggregation was
> +created in this manner due to the inability of Xen to launch more than one
> +domain at startup.
> +
> +Hyperlaunch Xen startup
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The Hyperlaunch approach’s primary focus is on how to assign the roles
> +traditionally granted to Dom0 to one or more domains at host boot. While the
> +statement is simple to make, the implications are not trivial by any means.
> +This also explains why the Hyperlaunch approach is orthogonal to the existing
> +dom0less capability. The dom0less capability focuses on enabling the launch 
> of
> +multiple domains in parallel with Dom0 at host boot. A corollary for dom0less
> +is that for systems that don’t require Dom0 after all guest domains have
> +started, they are able to do the host boot without a Dom0. Though it should 
> be
> +noted that it may be possible to start  Dom0 at a later point. Whereas with
> +Hyperlaunch, its approach of separating Dom0’s roles requires the ability to
> +launch multiple domains at host boot. The direct consequences from this
> +approach are profound and provide a myriad of possible configurations for 
> which
> +a sample of common boot configurations were already presented.
> +
> +To enable the Hyperlaunch approach a new alternative path for host boot 
> within
> +the hypervisor must be introduced. This alternative path effectively branches
> +just before the current point of Dom0 construction and begins an alternate
> +means of system construction. The determination if this alternate path should
> +be taken is through the inspection of the boot chain. If the bootloader has
> +loaded a specific configuration, as described later, it will enable Xen to
> +detect that a Hyperlaunch configuration has been provided. Once a Hyperlaunch
> +configuration is detected, this alternate path can be thought of as occurring
> +in phases: domain creation, domain preparation, and launch finalization.
> +
> +Domain Creation
> +"""""""""""""""
> +
> +The domain creation phase begins with Xen parsing the bootloader provided
> +material, to understand the content of the modules provided. It will then 
> load
> +any microcode or XSM policy it discovers. For each domain configuration Xen
> +finds, it parses the configuration to construct the necessary domain 
> definition
> +to instantiate an instance of the domain and leave it in a paused state. When
> +all domain configurations have been instantiated as domains, if one of them 
> is
> +flagged as the Boot Domain, that domain will be unpaused starting the domain
> +preparation phase. If there is no Boot Domain defined, then the domain
> +preparation phase will be skipped and Xen will trigger the launch 
> finalization
> +phase.
> +
> +Domain Preparation Phase
> +""""""""""""""""""""""""
> +
> +The domain preparation phase is an optional check point for the execution of 
> a
> +workload specific domain, the Boot Domain. While the Boot Domain is the first
> +domain to run and has some degree of control over the system, it is extremely
> +restricted in both system resource access and hypervisor operations. Its
> +purpose is to:
> +
> +* Access the configuration provided by the bootloader
> +* Finalize the configuration of the domains
> +* Conduct any setup and launch related operations
> +* Do an ordered unpause of domains that require an ordered start
> +
> +When the Boot Domain has completed, it will notify the hypervisor that it is
> +done triggering the launch finalization phase.
> +
> +
> +Launch Finalization
> +"""""""""""""""""""
> +
> +The hypervisor handles the launch finalization phase which is equivalent to 
> the
> +clean up phase. As such the steps taken by the hypervisor, not necessarily in
> +implementation order, are as follows,
> +
> +* Free the boot module chain
> +* If a Boot Domain was used, reclaim Boot Domain resources
> +* Unpause any domains still in a paused state
> +* Boot Domain uses a reserved function thus can never be respawned
> +
> +While the focus thus far has been on how the Hyperlaunch capability will 
> work,
> +it is worth mentioning what it does not do or limit from occurring. It does 
> not
> +stop or inhibit the assigning of the control domain role which gives the 
> domain
> +the ability to create, start, stop, restart, and destroy domains or the
> +hardware domain role which gives access to all I/O devices except those that
> +the hypervisor has reserved for itself. In particular it is still possible to
> +construct a domain with all the privileged roles, i.e. a Dom0, with or 
> without
> +the domain id being zero. In fact what limitations are imposed now become 
> fully
> +configurable without the risk of circumvention by an all privileged domain.
> +
> +Structuring of Hyperlaunch
> +--------------------------
> +
> +The structure of Hyperlaunch is built around the existing capabilities of the
> +host boot protocol. This approach was driven by the objective not to require
> +modifications to the boot loader. The only requirement is that the boot 
> loader
> +supports the Multiboot2 (MB2) protocol. For UEFI boot, our recommendation is 
> to
> +use GRUB.efi to load Xen and the initial domain materials via the multiboot2
> +method. On Arm platforms, Hyperlaunch is compatible with the existing 
> interface
> +for boot into the hypervisor.
> +
> +
> +x86 Multiboot2
> +^^^^^^^^^^^^^^
> +
> +The MB2 protocol has no concept of a manifest to tell the initial kernel what
> +is contained in the chain, leaving it to the kernel to impose a loading
> +convention, use magic number identification, or both. When considering the
> +passing of multiple kernels, ramdisks, and domain configuration along with 
> any
> +existing modules already passed, there is no sane convention that could be
> +imposed and magic number identification is nearly impossible when considering
> +the objective not to impose unnecessary complication to the hypervisor.
> +
> +As it was alluded to previously, a manifest describing the contents in the 
> MB2
> +chain and how they relate within a Xen context is needed. To address this 
> need
> +the Launch Control Module (LCM) was designed to provide such a manifest. The
> +LCM was designed to have a specific set of properties,
> +
> +* minimize the complexity of the parsing logic required by the hypervisor
> +* allow for expanding and optional configuration fragments without breaking
> +  backwards compatibility
> +
> +To enable automatic detection of a Hyperlaunch configuration, the LCM must be
> +the first MB2 module in the MB2 module chain. The LCM is implemented using 
> the
> +Device Tree as defined in the Hyperlaunch Device Tree design document. With 
> the
> +LCM implemented in Device Tree, it has a magic number that enables the
> +hypervisor to detect its presence when used in a Multiboot2 module chain. The
> +hypervisor can confirm that it is a proper LCM Device Tree by checking for a
> +compliant Hyperlaunch Device Tree. The Hyperlaunch Device Tree nodes are
> +designed to allow,
> +
> +* for the hypervisor to parse only those entries it understands,
> +* for packing custom information for a custom boot domain,
> +* the ability to use a new LCM with an older hypervisor,
> +* and the ability to use an older LCM with a new hypervisor.
> +
> +Arm Device Tree
> +^^^^^^^^^^^^^^^
> +
> +As discussed the LCM is in Device Tree format and was designed to co-exist in
> +the Device Tree ecosystem, and in particular in parallel with dom0less Device
> +Tree entries. On Arm, Xen is already designed to boot from a host Device Tree
> +description (dtb) file and the LCM entries can be embedded into this host dtb
> +file. This makes detecting the LCM entries and supporting Hyperlaunch on Arm
> +relatively straight forward. Relative to the described x86 approach, at the
> +point where Xen inspects the first MB2 module, on Arm Xen will check if the 
> top
> +level LCM node exists in the host dtb file. If the LCM node does exist, then 
> at
> +that point it will enter into the same code path as the x86 entry would go.
> +
> +Xen hypervisor
> +^^^^^^^^^^^^^^
> +
> +It was previously discussed at a higher level of the new host boot flow that
> +will be introduced. Within this new flow is the configuration parsing and
> +domain creation phase which will be expanded upon here. The hypervisor will
> +inspect the LCM for a config node and if found will iterate through all 
> modules
> +nodes. The module nodes are used to identify if any modules contain microcode
> +or an XSM policy. As it processes domain nodes, it will construct the domain
> +using the node properties and the modules nodes. Once it has completed
> +iterating through all the entries in the LCM, if a constructed domain has the
> +Boot Domain attribute, it will then be unpaused. Otherwise the hypervisor 
> will
> +start the launch finalization phase.
> +
> +Boot Domain
> +^^^^^^^^^^^
> +
> +Traditionally domain creation was controlled by the user within the Dom0
> +environment whereby custom toolstacks could be implemented to impose
> +requirements on the process. The Boot Domain is a means to enable the user to
> +continue to maintain a degree of that control over domain creation but 
> within a
> +limited privilege environment. The Boot Domain will have access to the LCM 
> and
> +the boot chain along with access to a subset of the hypercall operations. 
> When
> +the Boot Domain is finished it will notify the hypervisor through a hypercall
> +op.
> +
> +Recovery Domain
> +^^^^^^^^^^^^^^^
> +
> +With the existing Dom0 host boot path, when a failure occurs there are 
> several
> +assumptions that can safely be made to get the user to a console for
> +troubleshooting. With the Hyperlaunch host boot path those assumptions can no
> +longer be made, thus a means is needed to get the user to a console in the 
> case
> +of a recoverable failure. The recovery domain is configured by a domain
> +configuration entry in the LCM, in the same manner as the other initial
> +domains, and it will not be unpaused at launch finalization unless a failure 
> is
> +encountered starting the initial domains.
> +
> +Xen has existing support for a Crash Environment where memory can be reserved
> +at host boot and a kernel loaded into it, to be jumped into at any point 
> while
> +the system is running when a crash is detected. The Recovery Domain
> +functionality is a separate, complementary capability. The Crash Environment
> +replaces the previously active hypervisor and running guests, and enables a
> +process for mounting disks to write out log information prior to rebooting 
> the
> +system. In contrast, the Recovery Domain is able to use the functionality of
> +the Xen hypervisor, that is still present and running, to perform recovery
> +handling for errors encountered with starting the initial domains.
> +
> +Deferred Design
> +"""""""""""""""
> +
> +To be determined:
> +
> +* Define what is detected as a crash
> +* Explain how crash detection is performed and which components are involved
> +* Explain how the recovery domain is unpaused
> +* Explain how and when the resources assigned to the recovery domain are 
> reclaimed
> +* Define what the recovery domain is able to do
> +* Determine what permissions the recovery domain requires to perform its job
> +
> +
> +Control Domain
> +^^^^^^^^^^^^^^
> +
> +The concept of the Control Domain already exists within Xen as a boolean,
> +`is_privileged`, that governs access to many of the privileged interfaces of
> +the hypervisor that support a domain running a virtualization system 
> toolstack.
> +Hyperlaunch will allow the `is_privileged` flag to be set on any domain that 
> is
> +created at launch, rather than only a Dom0. It may potentially be set on
> +multiple domains.
> +
> +Hardware Domain
> +^^^^^^^^^^^^^^^
> +
> +The Hardware Domain is also an existing concept for Xen that is enabled 
> through
> +the `is_hardware_domain` check. With Hyperlaunch the previous process of I/O
> +accesses being assigned to Dom0 for later transfer to the hardware domain 
> would
> +no longer be required. Instead during the configuration phase the Xen
> +hypervisor would directly assign the I/O accesses to the domain with the
> +hardware domain permission bit enabled.
> +
> +Console Domain
> +^^^^^^^^^^^^^^
> +
> +Traditionally the Xen console is assigned to the control domain and then
> +reassignable by the toolstack to another domain. With Hyperlaunch it becomes
> +possible to construct a boot configuration where there is no control domain 
> or
> +have a use case where the Xen console needs to be isolated. As such it 
> becomes
> +necessary to be able to designate which of the initial domains should be
> +assigned the Xen console. Therefore Hyperlaunch introduces the ability to
> +specify an initial domain which the console is assigned along with a 
> convention
> +of ordered assignment for when there is no explicit assignment.
> +
> +Communication of Domain Configurations
> +======================================
> +
> +There are several standard methods for an Operating System to access machine
> +configuration and environment information: ACPI is common on x86 systems,
> +whereas Device Tree is more typical on Arm platforms. There are currently
> +implementations of both in Xen.
> +
> +* For dom0less, guest Device Trees are dynamically constructed by the
> +  hypervisor to convey domain configuration data
> +
> +* For PVH dom0 on x86, ACPI tables are built by the hypervisor before the
> +  domain is started
> +
> +Note that both of these mechanisms convey static data that is fixed prior to
> +the point of domain construction. Hyperlaunch will retain both the existing
> +ACPI and Device Tree methods.
> +
> +Communication of data between a Boot Domain and a Control Domain is of note
> +since they may not be running concurrently: the method used will depend on
> +their specific implementations, but one option available is to use Xen’s 
> hypfs
> +for transfer of basic data to support system bootstrap.
> +
> +-------------------------------------------------------------------------------
> +
> +Appendix
> +========
> +
> +Appendix 1: Flow Sequence of Steps of a Hyperlaunch Boot
> +--------------------------------------------------------
> +
> +Provided here is an ordered flow of a Hyperlaunch with a highlight logic
> +decision points. Not all branch points are recorded, specifically for the
> +variety of error conditions that may occur. ::
> +
> +  1. Hypervisor Startup:
> +  2a. (x86) Inspect first module provided by the bootloader
> +      a. Is the module an LCM
> +          i. YES: proceed with the Hyperlaunch host boot path
> +          ii. NO: proceed with a Dom0 host boot path
> +  2b. (Arm) Inspect host dtb for `/chosen/hypervisor` node
> +      a. Is the LCM present
> +          i. YES: proceed with the Hyperlaunch host boot path
> +          ii. NO: proceed with a Dom0/dom0less host boot path
> +  3. Iterate through the LCM entries looking for the module description
> +     entry
> +      a. Check if any of the modules are microcode or policy and if so,
> +         load
> +  4. Iterate through the LCM entries processing all domain description
> +     entries
> +      a. Use the details from the Basic Configuration to call
> +         `domain_create`
> +      b. Record if a domain is flagged as the Boot Domain
> +      c. Record if a domain is flagged as the Recovery Domain
> +  5. Was a Boot Domain created
> +      a. YES:
> +          i. Attach console to Boot Domain
> +          ii. Unpause Boot Domain
> +          iii. Goto Boot Domain (step 6)
> +      b. NO: Goto Launch Finalization (step 10)
> +  6. Boot Domain:
> +  7. Boot Domain comes online and may do any of the following actions
> +      a. Process the LCM
> +      b. Validate the MB2 chain
> +      c. Make additional configuration settings for staged domains
> +      d. Unpause any precursor domains
> +      e. Set any runtime configurations
> +  8. Boot Domain does any necessary cleanup
> +  9. Boot Domain make hypercall op call to signal it is finished
> +      i. Hypervisor reclaims all Boot Domain resources
> +      ii. Hypervisor records that the Boot Domain ran
> +      ii. Goto Launch Finalization (step 9)
> +  10. Launch Finalization
> +  11. If a configured domain was flagged to have the console, the
> +      hypervisor assigns it
> +  12. The hypervisor clears the LCM and bootloader loaded module,
> +      reclaiming the memory
> +  13. The hypervisor iterates through domains unpausing any domain not
> +      flagged as the recovery domain
> +
> +
> +Appendix 2: Considerations in Naming the Hyperlaunch Feature
> +------------------------------------------------------------
> +
> +* The term “Launch” is preferred over “Boot”
> +
> +        * Multiple individual component boots can occur in the new system 
> start
> +          process; Launch is preferable for describing the whole process
> +        * Fortunately there is consensus in the current group of stakeholders
> +          that the term “Launch” is good and appropriate
> +
> +* The names we define must support becoming meaningful and simple to use
> +  outside the Xen community
> +
> +        * They must be able to be resolved quickly via search engine to a 
> clear
> +          explanation (eg. Xen marketing material, documentation or wiki)
> +        * We prefer that the terms be helpful for marketing communications
> +        * Consequence: avoid the term “domain” which is Xen-specific and
> +          requires a definition to be provided each time when used elsewhere
> +
> +
> +* There is a need to communicate that Xen is  capable of being used as a 
> Static
> +  Partitioning hypervisor
> +
> +        * The community members using and maintaining dom0less are the 
> current
> +          primary stakeholders for this
> +
> +* There is a need to communicate that the new launch functionality provides 
> new
> +  capabilities not available elsewhere, and is more than just supporting 
> Static
> +  Partitioning
> +
> +        * No other hypervisor known to the authors of this document is 
> capable
> +          of providing what Hyperlaunch will be able to do. The launch 
> sequence is
> +          designed to:
> +
> +                * Remove dependency on a single, highly-privileged initial 
> domain
> +                * Allow the initial domains started to be independent and 
> fully
> +                  isolated from each other
> +                * Support configurations where no further VMs can be launched
> +                  once the initial domains have started
> +                * Use a standard, extensible format for conveying VM
> +                  configuration data
> +                * Ensure that domain building of all initial domains is
> +                  performed by the hypervisor from materials supplied by the
> +                  bootloader
> +                * Enable flexible configuration to be applied to all initial
> +                  domains by an optional Boot Domain, that runs with limited
> +                  privilege, before any other domain starts and obtains the 
> VM
> +                  configuration data from the bootloader materials via the
> +                  hypervisor
> +                * Enable measurements of all of the boot materials prior to
> +                  their use, in a sequence with minimized privilege
> +                * Support use-case-specific customized Boot Domains
> +                * Complement the hypervisor’s existing ability to enforce
> +                  policy-based Mandatory Access Control
> +
> +
> +* “Static” and “Dynamic” have different and important meanings in different
> +  communities
> +
> +        * Static and Dynamic Partitioning describe the ability to create new
> +          virtual machines, or not, after the initial host boot process
> +          completes
> +        * Static and Dynamic Root of Trust describe the nature of the trust
> +          chain for a measured launch. In this case Static is referring to 
> the
> +          fact that the trust chain is fixed and non-repeatable until the 
> next
> +          host reboot or shutdown. Whereas Dynamic in this case refers to the
> +          ability to conduct the measured launch at any time and potentially
> +          multiple times before the next host reboot or shutdown.
> +
> +                * We will be using Hyperlaunch with both Static and Dynamic
> +                  Roots of Trust, to launch both Static and Dynamically
> +                  Partitioned Systems, and being clear about exactly which
> +                  combination is being started will be very important (eg. 
> for
> +                  certification processes)
> +
> +        * Consequence: uses of “Static” and “Dynamic” need to be qualified if
> +          they are incorporated into the naming of this functionality
> +
> +                * This can be done by adding the preceding, stronger branded
> +                  term: “Hyperlaunch”, before “Static” or “Dynamic”
> +                * ie. “Hyperlaunch Static” describes launch of a
> +                  Statically Partitioned system
> +                * and “Hyperlaunch Dynamic” describes launch of a
> +                  Dynamically Partitioned system.
> +                * In practice, this means that “Hyperlaunch Static” describes
> +                  starting a Static Partitioned system where no new domains 
> can
> +                  be started later (ie. no VM has the Control Domain
> +                  permission), whereas “Hyperlaunch Dynamic” will launch some
> +                  VM with the Control Domain permission, able to create VMs
> +                  dynamically at a later point.
> +
> +**Naming Proposal:**
> +
> +* New Term: “Hyperlaunch” : the ability of a hypervisor to construct and 
> start
> +  one or more virtual machines at system launch, in the following manner:
> +
> +        * The hypervisor must build all of the domains that it starts at host
> +          boot
> +
> +                * Similar to the way the dom0 domain is built by the 
> hypervisor
> +                  today, and how dom0less works: it will run a loop to build
> +                  them all, driven from the configuration provided
> +                * This is a requirement for ensuring that there is Strong
> +                  Isolation between each of the initial VMs
> +
> +        * A single file contains the VM configs (“Launch Control Module”: 
> LCM,
> +          in Device Tree binary format) is provided to the hypervisor
> +
> +                * The hypervisor parses it and builds domains
> +                * If the LCM config says that a Boot Domain should run first,
> +                  then the LCM file itself is made available to the Boot 
> Domain
> +                  for it to parse and act on, to invoke operations via the
> +                  hypervisor to apply additional configuration to the other 
> VMs
> +                  (ie. executing a privilege-constrained toolstack)
> +
> +* New Term: “Hyperlaunch Static”: starts a Static Partitioned system, where
> +  only the virtual machines started at system launch are running on the 
> system
> +
> +* New Term: “Hyperlaunch Dynamic”: starts a system where virtual machines may
> +  be dynamically added after the initial virtual machines have started.
> +
> +
> +In the default configuration, Xen will be capable of both styles of 
> Hyperlaunch
> +from the same hypervisor binary, when paired with its XSM flask, provides
> +strong controls that enable fine grained system partitioning.
> +
> +
> +* Retiring Term: “DomB”: will no longer be used to describe the optional 
> first
> +  domain that is started. It is replaced with the more general term: “Boot
> +  Domain”.
> +
> +* Retiring Term: “Dom0less”: it is to be replaced with “Hyperlaunch Static”
> +
> +
> +Appendix 3: Terminology
> +-----------------------
> +
> +To help ensure clarity in reading this document, the following is the
> +definition of terminology used within this document.
> +
> +
> +Basic Configuration
> +    the minimal information the hypervisor requires to instantiate a domain 
> instance
> +
> +
> +Boot Domain
> +    a domain with limited privileges launched by the hypervisor during a
> +    Multiple Domain Boot that runs as the first domain started. In the 
> Hyperlaunch
> +    architecture, it is responsible for assisting with higher level 
> operations of
> +    the domain setup process.
> +
> +
> +Classic Launch
> +    a backwards-compatible host boot that ends with the launch of a single 
> domain (Dom0)
> +
> +
> +Console Domain
> +    a domain that has the Xen console assigned to it
> +
> +
> +Control Domain
> +    a privileged domain that has been granted Control Domain permissions 
> which
> +    are those that are required by the Xen toolstack for managing other 
> domains.
> +    These permissions are a subset of those that are granted to Dom0.
> +
> +
> +Device Tree
> +    a standardized data structure, with defined file formats, for describing
> +    initial system configuration
> +
> +
> +Disaggregation
> +    the separation of system roles and responsibilities across multiple
> +    connected components that work together to provide functionality
> +
> +
> +Dom0
> +    the highly-privileged, first and only domain started at host boot on a
> +    conventional Xen system
> +
> +
> +Dom0less
> +    an existing feature of Xen on Arm that provides Multiple Domain Boot
> +
> +
> +Domain
> +    a running instance of a virtual machine; (as the term is commonly used in
> +    the Xen Community)
> +
> +DomB
> +     the former name for Hyperlaunch
> +
> +
> +Extended Configuration
> +    any configuration options for a domain beyond its Basic Configuration
> +
> +
> +Hardware Domain
> +    a privileged domain that has been granted permissions to access and 
> manage
> +    host hardware. These permissions are a subset of those that are granted 
> to
> +    Dom0.
> +
> +
> +Host Boot
> +    the system startup of Xen using the configuration provided by the 
> bootloader
> +
> +
> +Hyperlaunch
> +    a flexible host boot that ends with the launch of one or more domains
> +
> +
> +Initial Domain
> +    a domain that is described in the LCM that is run as part of a multiple
> +    domain boot. This includes the Boot Domain, Recovery Domain and all 
> Launched
> +    Domains.
> +
> +
> +Late Hardware Domain
> +    a Hardware Domain that is launched after host boot has already completed
> +    with a running Dom0. When the Late Hardware Domain is started, Dom0
> +    relinquishes and transfers the permissions to access and manage host 
> hardware
> +    to it..
> +
> +
> +Launch Control Module (LCM)
> +    A file supplied to the hypervisor by the bootloader that contains
> +    configuration data for the hypervisor and the initial set of virtual 
> machines
> +    to be run at boot
> +
> +
> +Launched Domain
> +    a domain, aside from the boot domain and recovery domain, that is 
> started as
> +    part of a multiple domain boot and remains running once the boot process 
> is
> +    complete
> +
> +
> +Multiple Domain Boot
> +    a system configuration where the hypervisor and multiple virtual machines
> +    are all launched when the host system hardware boots
> +
> +
> +Recovery Domain
> +    an optional fallback domain that the hypervisor may start in the event 
> of a
> +    detectable error encountered during the multiple domain boot process
> +
> +
> +System Device Tree
> +    this is the product of an Arm community project to extend Device Tree to
> +    cover more aspects of initial system configuration
> +
> +
> +Appendix 4: Copyright License
> +-----------------------------
> +
> +This work is licensed under a Creative Commons Attribution 4.0 International
> +License. A copy of this license may be obtained from the Creative Commons
> +website (https://creativecommons.org/licenses/by/4.0/legalcode).
> +
> +| Contributions by:
> +| Christopher Clark are Copyright © 2021 Star Lab Corporation
> +| Daniel P. Smith are Copyright  © 2021 Apertus Solutions, LLC
> --
> 2.25.1
>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.