[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC PATCH v3 15/22] Start documenting the live update handover
From: David Woodhouse <dwmw@xxxxxxxxxxxx> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> --- docs/specs/libxc-migration-stream.pandoc | 19 +- docs/specs/live-update-handover.pandoc | 371 +++++++++++++++++++++++ 2 files changed, 388 insertions(+), 2 deletions(-) create mode 100644 docs/specs/live-update-handover.pandoc diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc index a7a8a08936..9a6679f3de 100644 --- a/docs/specs/libxc-migration-stream.pandoc +++ b/docs/specs/libxc-migration-stream.pandoc @@ -227,12 +227,18 @@ type 0x00000000: END 0x0000000F: CHECKPOINT_DIRTY_PFN_LIST (Secondary -> Primary) - 0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_ + 0x00000010 - 0x3FFFFFFF: Reserved for future _mandatory_ records. - 0x80000000 - 0xFFFFFFFF: Reserved for future _optional_ + 0x40000000 - 0x7FFFFFFF: Reserved for future _mandatory_ + live update records. + + 0x80000000 - 0xBFFFFFFF: Reserved for future _optional_ records. + 0xC0000000 - 0xFFFFFFFF: Reserved for future _optional_ + live update records. + body_length Length in octets of the record body. body Content of the record. @@ -246,6 +252,15 @@ Records may be _mandatory_ or _optional_. Optional records have bit unsupported mandatory record must fail. The contents of optional records may be ignored during a restore. +Note: This basic record format,. and some of the record types defined here, +are also used for Live Update, as discussed in the Live Update Handover +document: `docs/specs/live-update-handover.pandoc`. + +Records defined for live update have bit 30 set in their type value, +are defined in that document, and are out of scope for this document. +Such records shall not appear in the Domain Image Format as defined by +this document. + The following sub-sections specify the record body format for each of the record types. diff --git a/docs/specs/live-update-handover.pandoc b/docs/specs/live-update-handover.pandoc new file mode 100644 index 0000000000..31d23c7c90 --- /dev/null +++ b/docs/specs/live-update-handover.pandoc @@ -0,0 +1,371 @@ +% Live Update Handover Protocol +% David Woodhouse <<dwmw@xxxxxxxxxxxx>> +% Revision 1 + +Introduction +============ + +Purpose +------- + +Live update performs a _kexec_ from one running version of Xen to +another, preserving all running domains in a form of guest-transparent +live migration. + +This document outlines the memory layout requirements and data stream +used in handover protocol, to ensure that pages used by running +domains are preserved during the transition from one version of Xen +to the next. + + +Compatibility +------------- + +It cannot be repeated often enough that information passed over live +update is an ABI. It is expected that live update can be performed from +one major version of Xen to another, or even hypothetically to a system +which is not Xen at all. + +It is necessary that some data are handed over "in place"; in +particular the memory pages of the running domains. However, no +internal Xen data structures may be transferred in this fashion; at +least not without retrospectively declaring them to be ABI, with the +restrictions that places on subsequent changes. + + + +Handover +======== + + +Memory Usage Restrictions +------------------------- + +The new Xen must take care not to use any memory pages which already +belong to guests. To facilitate this, a contiguous region of memory +is reserved for the boot allocator, known as _live update bootmem_. + +This region is reserved by the original Xen during its own boot, and +the location made available to the _kexec(8)_ user space tool +through the `kexec_get_range` hypercall using a new region type +`KEXEC_RANGE_MA_LIVEUPDATE`. It is passed to the new Xen on the +command line, using the `liveupdate=` parameter. + +The new Xen must not use any pages outside this region until it has +consumed the live update data stream and determined which pages are +already in use by running domains. + +At run time, Xen may use memory from the reserved region for any +purpose that does not require preservation over a live update; in +particular it must not be mapped to a domain. + +The new Xen executable image must be loaded by kexec to the same +physical location as the running Xen, since that region of memory is +known to be available. For that reason, freed init memory from the +Xen image is also treated as reserved _live update bootmem_. + + +Live Update Data Stream +----------------------- + +During handover, the running Xen pauses all domains and creates a +_live update data stream_ containing all the information required by +the new Xen to restore them. This is largely the same as guest +transparent live migration. + +Data pages for this stream may be allocated anywhere in physical +memory outside the _live update bootmem_ regions. + +Xen creates a physically contiguous array of MFNs of the allocated +data pages, suitable for passing to `vmap()` to obtain a virtually +contiguous mapping of the whole data stream. + + +Breadcrumb +---------- + +Since the live update data stream is created during the final `kexec_exec` +hypercall, its address cannot be passed on the command line to the +new Xen since the command line needs to have been set up by `kexec(8)` +in userspace long beforehand. + +Thus, to allow the new Xen to find the data stream, the old Xen places +a _breadcrumb_ in the first words of the _live update bootmem_, containing +the number of data pages, and the physical address of the contiguous MFN +array. + +The breadcrumb is written as the last action of the `kexec_reloc()` +routine during the `kexec` handover, so cannot overwrite anything +important by virtue of the existing guarantee that Xen will not place +any data in that region which needs to survive across a live update. + +A restriction of the `kexec_reloc()` mechanism for writing the breadcrumb +is that the values are host-endian and are masked with PAGE_MASK; the low +bits are zeroed. This is actually perfect for the magic value used +to recognise a live update breadcrumb, since it neatly prevents any attempt +to live update to a Xen which uses a different endianness or page size. + +For the physical address of the MFN list it's perfectly fine, since +that list is page-aligned anyway. For the number of pages, it means +the value must be shifted accordingly. Hence the use of `shifted_nr_pages` +in the breadcrumb structure below: + + + 0 1 2 3 4 5 6 7 octet + +-------------------------------------------------+ + | live_update_magic | + +-------------------------------------------------+ + | mfn_array_physaddr | + +-------------------------------------------------+ + | shifted_nr_pages | + +-------------------------------------------------+ + +-------------------------------------------------------------------- +Field Description +------------------- ------------------------------------------------ +live_update_magic "LiveUpda" (0x4c69766555706461) stored in the the host + endianness and masked with PAGE_MASK. + For example on x86_64: `00 60 70 55 65 76 89 4c`. + +mfn_array_physaddr Machine address of MFN list for data streaes. + +shift_nr_pages Number of data pages, shifted by PAGE_SHIFT to + avoid the limitation of kexec_reloc(). +-------------------------------------------------------------------- + + +IOMMU +----- + +Where devices are passed through to domains, it may not be possible +to quiesce those devices for the purpose of performing the update. + +If performing live update with assigned devices, the original Xen will +leave the IOMMU mappings active during the handover (thus implying +that IOMMU page tables may not be allocated in the `live update +bootmem` region either). + +The new Xen must resume control of the IOMMU without causing those mappings +to become invalid even for a short period of time. On hardware which does not +support Posted Interrupts, interrupts may need to be generated on resume. + +_This section will be expanded once we actually have it working._ + +\clearpage + +Data Stream Overview +==================== + +Once discovered and mapped, the live update data stream forms a +virtually contiguous stream of records following the basic form +documented in the LibXenCtrl Domain Image Format at +`docs/specs/libxc-migration-stream.pandoc`. + +Some record types from the LibXenCtrl Domain Image format are used +as-is, such as the `X86_PV_INFO`, `X86_PV_VCPU_BASIC`, `HVM_CONTEXT` +and other records containing domain-specific data. + +The Domain Header from that document is not used in that form, and a new +record of type `LU_DOMAIN_INFO` is defined below. + +Other new record types specific to the live update process are defined in +this document. Of those, some contain global state such as the M2P table +information, while others are domain-specific. + +The live update data stream starts with records containing global +information, followed any number of times by a `LU_DOMAIN_INFO` record +and subsequent domain-specific records for that domain. + +There is a single `END` record at the end of the live update data stream, +indicating that no more `DOMAIN_INFO` records are present. + +\clearpage + +As defined in the LibXenCtrl Domain Image format document, a record +has the following structure. Record type values defined for live update +have bit 30 set, and are thus in the range 0x40000000-0x7FFFFFFF for +mandatory live update records, and 0xC0000000-0xFFFFFFFF for optional +live update records _(of which there are none at the present time)_. + + + 0 1 2 3 4 5 6 7 octet + +-----------------------+-------------------------+ + | type | body_length | + +-----------+-----------+-------------------------+ + | body... | + ... + | | padding (0 to 7 octets) | + +-----------+-------------------------------------+ + +-------------------------------------------------------------------- +Field Description +----------- ------------------------------------------------------- +type 0x40000000: LU_VERSION + + 0x40000001: LU_M2P + + 0x40000002: LU_M2P_COMPAT + + 0x40000003: LU_DOMAIN_INFO + + 0x40000004 - 0x7FFFFFFF: Reserved for future _mandatory_ + live update records. + + 0xC0000000 - 0xFFFFFFFF: Reserved for future _optional_ + live update records. + +body_length Length in octets of the record body. + +body Content of the record. + +padding 0 to 7 octets of zeros to pad the whole record to a multiple + of 8 octets. +-------------------------------------------------------------------- + + +\clearpage + +Global Records +============== + +LU_VERSION +---------- + +The version field indicates the version of Xen from which the system +is live updating. In theory this should never be relevant, but it +allows for version-specific workarounds to be implementing in the receiving +Xen should they become necessary. + + 0 1 2 3 4 5 6 7 octet + +-----------------------+-----------+-------------+ + | xen_major | xen_minor | + +-----------------------+-------------------------+ + + +-------------------------------------------------------------------- +Field Description +----------- -------------------------------------------------------- +xen_major The Xen major version from which the system is updating. + +xen_minor The Xen minor version from which the system is updating. +-------------------------------------------------------------------- + +\clearpage + +LU_M2P / LU_M2P_COMPAT +---------------------- + +The M2P and compatibility M2P records contain a scatter/gather list of +pages containing native or 32-bit M2P data. + + + 0 1 2 3 4 5 6 7 octet + +-----------------------+-------------------------+ + | m2p_page_data[0]... | + ... + +-------------------------------------------------+ + | m2p_page_data[N-1]... | + ... + +-------------------------------------------------+ + +-------------------------------------------------------------------- +Field Description +----------- -------------------------------------------------------- +m2p_page_data A 64-bit value containing the physical address of the + next page of M2P data, encoding the _order_ of the page + into the low 12 bits. Thus, a 1GiB page at 0x4C0000000 + would be encoded as 0x4C000001E. + + In case the M2P does not contiguously cover pages starting + from MFN zero, a discontiguity is indicated by a field + with order set to zero. The high bits of the field then + provide the MFN for which the subsequent M2P data page + provides data. + +-------------------------------------------------------------------- + +\clearpage + +Domain Specific Records +======================= + + +LU_DOMAIN_INFO +-------------- + +The domain info record contains general properties necessary to +recreate a domain in the receiving Xen, and marks the start of a set +of other domain-specific records pertaining to that domain. + + 0 1 2 3 4 5 6 7 octet + +-----------------------+-----------+-------------+ + | type | page_shift| domain_id | + +-----------------------+-----------+-------------+ + | domain_handle[0-7] | + +-------------------------------------------------+ + | domain_handle[8-15] | + +-----------------------+-------------------------+ + | ssidref | flags | + +-----------------------+-------------------------+ + | max_vcpus | emulation_flags | + +-----------------------+-------------------------+ + | extra_flags | (padding) | + +-----------------------+-------------------------+ + + +-------------------------------------------------------------------- +Field Description +--------------- -------------------------------------------------------- +type 0x0000: Reserved. + + 0x0001: x86 PV. + + 0x0002: x86 HVM. + + 0x0003 - 0xFFFFFFFF: Reserved. + +page_shift Size of a guest page as a power of two. + + i.e., page size = 2 ^page_shift^. + +domain_id Domain ID + + +domain_handle UUID domain handle. + +ssidref Security Identifier Index + +flags Domain flags using `XEN_DOMCTL_CTF_` + +max_vcpus Maximum vCPUs for domain. + +emulation_flags Emulation flags using `XEN_X86_EMU_` + +extra_flags Additional flags: + + 0x00000001: Is privileged + +-------------------------------------------------------------------- + +\clearpage + +Future Extensions +================= + +All changes to this specification should bump the revision number in +the title block. + +All changes to the image or domain headers require the image version +to be increased. + +The format may be extended by adding additional record types. + +Extending an existing record type must be done by adding a new record +type. This allows old images with the old record to still be +restored. + +The image header may only be extended by _appending_ additional +fields. In particular, the `marker`, `id` and `version` fields must +never change size or location. + + -- 2.21.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |