[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [DRAFT C] PVH CPU hotplug design document
Hello, Below is a draft of a design document for PVHv2 CPU hotplug. It should cover both vCPU and pCPU hotplug. It's mainly centered around the hardware domain, since for unprivileged PVH guests the vCPU hotplug mechanism is already described in Boris series [0], and it's shared with HVM. The aim here is to find a way to use ACPI vCPU hotplug for the hardware domain, while still being able to properly detect and notify Xen of pCPU hotplug. [0] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html ---8<--- % CPU hotplug support for PVH % Roger Pau Monné <roger.pau@xxxxxxxxxx> % Draft C # Revision History | Version | Date | Changes | |---------|-------------|---------------------------------------------------| | Draft A | 5 Jan 2017 | Initial draft. | |---------|-------------|---------------------------------------------------| | Draft B | 12 Jan 2017 | Removed the XXX comments and clarify some | | | | sections. | | | | | | | | Added a sample of the SSDT ASL code that would be | | | | appended to the hardware domain. | |---------|-------------|---------------------------------------------------| |Draft C | 17 Jan 2017 | Define a _SB.XEN0 bus device and place all the | | | | processor objects and the GPE block inside of it. | | | | | | | | Place the GPE status and enable registers and | | | | the vCPU enable bitmap in memory instead of IO | | | | space. | # Preface This document aims to describe the interface to use in order to implement CPU hotplug for PVH guests, this applies to hotplug of both physical and virtual CPUs. # Introduction One of the design goals of PVH is to be able to remove as much Xen PV specific code as possible, thus limiting the number of Xen PV interfaces used by guests, and tending to use native interfaces (as used by bare metal) as much as possible. This is in line with the efforts also done by Xen on ARM and helps reduce the burden of maintaining huge amounts of Xen PV code inside of guests kernels. This however presents some challenges due to the model used by the Xen Hypervisor, where some devices are handled by Xen while others are left for the hardware domain to manage. The fact that Xen lacks and AML parser also makes it harder, since it cannot get the full hardware description from dynamic ACPI tables (DSDT, SSDT) without the hardware domain collaboration. One of such issues is CPU enumeration and hotplug, for both the hardware and unprivileged domains. The aim is to be able to use the same enumeration and hotplug interface for all PVH guests, regardless of their privilege. This document aims to describe the interface used in order to fulfill the following actions: * Virtual CPU (vCPU) enumeration at boot time. * Hotplug of vCPUs. * Hotplug of physical CPUs (pCPUs) to Xen. # Prior work ## PV CPU hotplug CPU hotplug for Xen PV guests is implemented using xenstore and hypercalls. The guest has to setup a watch event on the "cpu/" xenstore node, and react to changes in this directory. CPUs are added creating a new node and setting it's "availability" to online: cpu/X/availability = "online" Where X is the vCPU ID. This is an out-of-band method, that relies on Xen specific interfaces in order to perform CPU hotplug. ## QEMU CPU hotplug using ACPI The ACPI tables provided to HVM guests contain processor objects, as created by libacpi. The number of processor objects in the ACPI namespace matches the maximum number of processors supported by HVM guests (up to 128 at the time of writing). Processors currently disabled are marked as so in the MADT and in their \_MAT and \_STA methods. A PRST operation region in I/O space is also defined, with a size of 128bits, that's used as a bitmap of enabled vCPUs on the system. A PRSC method is provided in order to check for updates to the PRST region and trigger notifications on the affected processor objects. The execution of the PRSC method is done by a GPE event. Then OSPM checks the value returned by \_STA for the ACPI\_STA\_DEVICE\_PRESENT flag in order to check if the vCPU has been enabled. ## Native CPU hotplug OSPM waits for a notification from ACPI on the processor object and when an event is received the return value from _STA is checked in order to see if ACPI\_STA\_DEVICE\_PRESENT has been enabled. This notification is triggered from the method of a GPE block. # PVH CPU hotplug The aim as stated in the introduction is to use a method as similar as possible to bare metal CPU hotplug for PVH, this is feasible for unprivileged domains, since the ACPI tables can be created by the toolstack and provided to the guest. Then a minimal I/O or memory handler will be added to Xen in order to report the bitmap of enabled vCPUs. There's already a [series][0] posted to xen-devel that implement this functionality for unprivileged PVH guests. This however is proven to be quite difficult to implement for the hardware domain, since it has to manage both pCPUs and vCPUs. The hardware domain should be able to notify Xen of the addition of new pCPUs, so that they can be used by the Hypervisor, and also be able to hotplug new vCPUs for it's own usage. Since Xen cannot access the dynamic (AML) ACPI tables, because it lacks an AML parser, it is the duty of the hardware domain to parse those tables and notify Xen of relevant events. There are several related issues here that prevent a straightforward solution to this issue: * Xen cannot parse AML tables, and thus cannot get notifications from ACPI events. And even in the case that Xen could parse those tables, there can only be one OSPM registered with ACPI * Xen can provide a valid MADT table to the hardware domain that describes the environment in which the hardware domain is running, but it cannot prevent the hardware domain from seeing the real processor devices in the ACPI namespace, nor Xen can provide the hardware domain with processor devices that match the vCPUs at the moment. [0]: https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html ## Proposed solution using the STAO The general idea of this method is to use the STAO in order to hide the pCPUs from the hardware domain, and provide processor objects for vCPUs in an extra SSDT table. This method requires one change to the STAO, in order to be able to notify the hardware domain of which processors found in ACPI tables are pCPUs. The description of the new STAO field is as follows: | Field | Byte Length | Byte Offset | Description | |--------------------|:-----------:|:-----------:|--------------------------| | Processor List [n] | - | - | A list of ACPI numbers, | | | | | where each number is the | | | | | Processor UID of a | | | | | physical CPU, and should | | | | | be treated specially by | | | | | the OSPM | The list of UIDs in this new field would be matched against the ACPI Processor UID field found in local/x2 APIC MADT structs and Processor objects in the ACPI namespace, and the OSPM should either ignore those objects, or in case it implements pCPU hotplug, it should notify Xen of changes to these objects. The contents of the MADT provided to the hardware domain are also going to be different from the contents of the MADT as found in native ACPI. The local/x2 APIC entries for all the pCPUs are going to be marked as disabled. Extra entries are going to be added for each vCPU available to the hardware domain, up to the maximum number of supported vCPUs. Note that supported vCPUs might be different than enabled vCPUs, so it's possible that some of these entries are also going to be marked as disabled. The entries for vCPUs on the MADT are going to use a processor local x2 APIC structure, and the ACPI processor ID of vCPUs are not going to re-use processor IDs already used by pCPUs. Xen makes no guarantee about the processor ID of the first vCPU, neither the OS must assume them to be consecutive. Note that this would limit the number of vCPUs so that (pCPUs + vCPUs) < 2^32. In order to be able to perform vCPU hotplug, the vCPUs must have an ACPI processor object in the ACPI namespace, so that the OSPM can request notifications and get the value of the \_STA and \_MAT methods. This can be problematic because Xen doesn't know the ACPI name of the other processor objects, so blindly adding new ones can create namespace clashes. This can be solved by using a different ACPI name in order to describe vCPUs in the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should prevent clashes. A Xen GPE device block will be used in order to deliver events related to the vCPUs available to the guest, since Xen doesn't know if there are any bits available in the native GPEs. A SCI interrupt will be injected into the guest in order to trigger the event. The following snippet is a representation of the ASL SSDT code that is proposed for the hardware domain: DefinitionBlock ("SSDT.aml", "SSDT", 5, "Xen", "HVM", 0) { Device ( \_SB.XEN0 ) { Name ( _HID, "ACPI0004" ) /* ACPI Module Device (bus node) */ } Scope (\_SB.XEN0) { OperationRegion(XEN, SystemMemory, 0xXXXXXXXX, 41) Field(XEN, ByteAcc, NoLock, Preserve) { PRS, 2, /* vCPU enabled bitmap */ NCPU, 16, /* Number of vCPUs */ MSUA, 32, /* MADT checksum address */ MAPA, 32, /* MADT LAPIC0 address */ } OperationRegion ( MSUM, SystemMemory, \_SB.XEN0.MSUA, 1 ) Field ( MSUM, ByteAcc, NoLock, Preserve ) { MSU, 8 } Method ( PMAT, 2 ) { If ( LLess(Arg0, NCPU) ) { Return ( ToBuffer(Arg1) ) } Return ( Buffer() {0, 8, 0xff, 0xff, 0, 0, 0, 0} ) } Processor ( VP00, 0, 0x0000b010, 0x06 ) { Name ( _HID, "ACPI0007" ) Name ( _UID, 1 ) OperationRegion ( MATR, SystemMemory, Add(\_SB.XEN0.MAPA, 0), 8 ) Field ( MATR, ByteAcc, NoLock, Preserve ) { MAT, 64 } Field ( MATR, ByteAcc, NoLock, Preserve ) { Offset(4), FLG, 1 } Method ( _MAT, 0 ) { Return ( ToBuffer(MAT) ) } Method ( _STA ) { If ( FLG ) { Return ( 0xF ) } Return ( 0x0 ) } Method ( _EJ0, 1, NotSerialized ) { Sleep ( 0xC8 ) } } Processor ( VP01, 1, 0x0000b010, 0x06 ) { Name ( _HID, "ACPI0007" ) Name ( _UID, 2 ) OperationRegion ( MATR, SystemMemory, Add(\_SB.XEN0.MAPA, 8), 8 ) Field ( MATR, ByteAcc, NoLock, Preserve ) { MAT, 64 } Field ( MATR, ByteAcc, NoLock, Preserve ) { Offset(4), FLG, 1 } Method ( _MAT, 0 ) { Return ( PMAT (1, MAT) ) } Method ( _STA ) { If ( LLess(1, \_SB.XEN0.NCPU) ) { If ( FLG ) { Return ( 0xF ) } } Return ( 0x0 ) } Method ( _EJ0, 1, NotSerialized ) { Sleep ( 0xC8 ) } } Method ( PRSC, 0 ) { Store ( ToBuffer(PRS), Local0 ) Store ( DerefOf(Index(Local0, 0)), Local1 ) And ( Local1, 1, Local2 ) If ( LNotEqual(Local2, \_SB.XEN0.VP00.FLG) ) { Store ( Local2, \_SB.XEN0.VP00.FLG ) If ( LEqual(Local2, 1) ) { Notify ( VP00, 1 ) Subtract ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU ) } Else { Notify ( VP00, 3 ) Add ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU ) } } ShiftRight ( Local1, 1, Local1 ) And ( Local1, 1, Local2 ) If ( LNotEqual(Local2, \_SB.XEN0.VP01.FLG) ) { Store ( Local2, \_SB.XEN0.VP01.FLG ) If ( LEqual(Local2, 1) ) { Notify ( VP01, 1 ) Subtract ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU ) } Else { Notify ( VP01, 3 ) Add ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU ) } } Return ( One ) } } Device ( \_SB.XEN0.GPE0 ) { Name ( _HID, "ACPI0006" ) Name ( _UID, "XENGPE0" ) Name ( _CRS, ResourceTemplate() { Memory32Fixed ( ReadWrite, 0xXXXXXXXX, 0x4 ) } ) Method ( _E02 ) { \_SB.XEN0.PRSC () } } } Since the position of the XEN data memory area is not know, the hypervisor will have to replace the address noted as 0xXXXXXXXX with the actual memory address where this structure has been copied. The ACPI processor IDs will also be replaced by Xen during runtime (noted as 1 and 2 in the snipped above). The PRST region containing the vCPU enabled bitmap would also need to be relocated by Xen over a RAM region, and updated accordingly when a vCPU is added or removed. The replacement can be done by compiling two different versions of the above ASL code, each one having different values for the XEN operation region, the ACPI processor objects IDs and other values that need to be set on a per-system basis, and doing a binary comparison between them in order to get the relative offsets of the differences. Note that the XEN operation region and the GPE event and status regions would be placed over a RAM memory region. In order to implement this, the hypervisor build is going to use part of libacpi and the iasl compiler. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |