[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PVH CPU hotplug design document





On 01/12/2017 02:00 PM, Andrew Cooper wrote:
On 12/01/17 12:13, Roger Pau Monné wrote:

## Proposed solution using the STAO

The general idea of this method is to use the STAO in order to hide the pCPUs
from the hardware domain, and provide processor objects for vCPUs in an extra
SSDT table.

This method requires one change to the STAO, in order to be able to notify the
hardware domain of which processors found in ACPI tables are pCPUs. The
description of the new STAO field is as follows:

 |   Field            | Byte Length | Byte Offset |     Description          |
 |--------------------|:-----------:|:-----------:|--------------------------|
 | Processor List [n] |      -      |      -      | A list of ACPI numbers,  |
 |                    |             |             | where each number is the |
 |                    |             |             | Processor UID of a       |
 |                    |             |             | physical CPU, and should |
 |                    |             |             | be treated specially by  |
 |                    |             |             | the OSPM                 |

The list of UIDs in this new field would be matched against the ACPI Processor
UID field found in local/x2 APIC MADT structs and Processor objects in the ACPI
namespace, and the OSPM should either ignore those objects, or in case it
implements pCPU hotplug, it should notify Xen of changes to these objects.

The contents of the MADT provided to the hardware domain are also going to be
different from the contents of the MADT as found in native ACPI. The local/x2
APIC entries for all the pCPUs are going to be marked as disabled.

Extra entries are going to be added for each vCPU available to the hardware
domain, up to the maximum number of supported vCPUs. Note that supported vCPUs
might be different than enabled vCPUs, so it's possible that some of these
entries are also going to be marked as disabled. The entries for vCPUs on the
MADT are going to use a processor local x2 APIC structure, and the ACPI
processor ID of the first vCPU is going to be UINT32_MAX - HVM_MAX_VCPUS, in
order to avoid clashes with IDs of pCPUs.

This is slightly problematic.  There is no restriction (so far as I am
aware) on which ACPI IDs the firmware picks for its objects.  They need
not be consecutive, logical, or start from 0.

If STAO is being extended to list the IDs of the physical processor
objects, we should go one step further and explicitly list the IDs of
the virtual processor objects.  This leaves us flexibility if we have to
avoid awkward firmware ID layouts.


I don't think I understand how we'd use VCPU list in STAO. Can you explain this?



It is also work stating that this puts an upper limit on nr_pcpus +
nr_dom0_vcpus (but 4 billion processors really ought to be enough for
anyone...)

In order to be able to perform vCPU hotplug, the vCPUs must have an ACPI
processor object in the ACPI namespace, so that the OSPM can request
notifications and get the value of the \_STA and \_MAT methods. This can be
problematic because Xen doesn't know the ACPI name of the other processor
objects, so blindly adding new ones can create namespace clashes.

This can be solved by using a different ACPI name in order to describe vCPUs in
the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
prevent clashes.

One system I have to hand (with more than 255 pcpus) uses Cxxx

To avoid namespace collisions, I can't see any option but to parse the
DSDT/SSDTs to at least confirm that VPxx is available to use.

You are talking about Xen doing this, right? Meaning that we'd need to add AML parser to the hypervisor?

If we do that, I wonder whether this will also help us to deal with _PSS and _CST, which we now have to pass down from dom0.



A Xen GPE device block will be used in order to deliver events related to the
vCPUs available to the guest, since Xen doesn't know if there are any bits
available in the native GPEs. A SCI interrupt will be injected into the guest
in order to trigger the event.

The following snippet is a representation of the ASL SSDT code that is proposed
for the hardware domain:

    DefinitionBlock ("SSDT.aml", "SSDT", 5, "Xen", "HVM", 0)
    {
        Scope (\_SB)
        {
           OperationRegion(XEN, SystemMemory, 0xDEADBEEF, 40)
           Field(XEN, ByteAcc, NoLock, Preserve) {
               NCPU, 16, /* Number of vCPUs */
               MSUA, 32, /* MADT checksum address */
               MAPA, 32, /* MADT LAPIC0 address */
           }
        }
        Scope ( \_SB ) {
            OperationRegion ( MSUM, SystemMemory, \_SB.MSUA, 1 )
            Field ( MSUM, ByteAcc, NoLock, Preserve ) {
                MSU, 8
            }
            Method ( PMAT, 2 ) {
                If ( LLess(Arg0, NCPU) ) {
                    Return ( ToBuffer(Arg1) )
                }
                Return ( Buffer() {0, 8, 0xff, 0xff, 0, 0, 0, 0} )
            }
            Processor ( VP00, 0, 0x0000b010, 0x06 ) {
                Name ( _HID, "ACPI0007" )
                Name ( _UID, 4294967167 )
                OperationRegion ( MATR, SystemMemory, Add(\_SB.MAPA, 0), 8 )
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    MAT, 64
                }
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    Offset(4),
                    FLG, 1
                }
                Method ( _MAT, 0 ) {
                    Return ( ToBuffer(MAT) )
                }
                Method ( _STA ) {
                    If ( FLG ) {
                        Return ( 0xF )
                    }
                    Return ( 0x0 )
                }
                Method ( _EJ0, 1, NotSerialized ) {
                    Sleep ( 0xC8 )
                }
            }
            Processor ( VP01, 1, 0x0000b010, 0x06 ) {
                Name ( _HID, "ACPI0007" )
                Name ( _UID, 4294967168 )
                OperationRegion ( MATR, SystemMemory, Add(\_SB.MAPA, 8), 8 )
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    MAT, 64
                }
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    Offset(4),
                    FLG, 1
                }
                Method ( _MAT, 0 ) {
                    Return ( PMAT (1, MAT) )
                }
                Method ( _STA ) {
                    If ( LLess(1, \_SB.NCPU) ) {
                        If ( FLG ) {
                            Return ( 0xF )
                        }
                    }
                    Return ( 0x0 )
                }
                Method ( _EJ0, 1, NotSerialized ) {
                    Sleep ( 0xC8 )
                }
            }
            OperationRegion ( PRST, SystemIO, 0xaf00, 1 )

This also has a chance of collision, both with the system ACPI
controller, and also with PCIe devices advertising IO-BARs.  (All
graphics cards ever have IO-BARs, because windows refuses to bind a
graphics driver to a PCI graphics device if the PCI device doesn't have
at least one IO-BAR.  Because PCIe requires 4k alignment on the upstream
bridge IO-windows, there is a surprisingly low limit on the number of
graphics cards you can put in a server and have functioning to windows
satisfaction.)

As with the other risks of collisions, Xen is going to have to search
the system to find a free area to use.


I am pretty ignorant about AML but is it possible to have AML dynamically determine the address? Or is it a compile-time value?


-boris


            Field ( PRST, ByteAcc, NoLock, Preserve ) {
                PRS, 2
            }
            Method ( PRSC, 0 ) {
                Store ( ToBuffer(PRS), Local0 )
                Store ( DerefOf(Index(Local0, 0)), Local1 )
                And ( Local1, 1, Local2 )
                If ( LNotEqual(Local2, \_SB.VP00.FLG) ) {
                    Store ( Local2, \_SB.VP00.FLG )
                    If ( LEqual(Local2, 1) ) {
                        Notify ( VP00, 1 )
                        Subtract ( \_SB.MSU, 1, \_SB.MSU )
                    }
                    Else {
                        Notify ( VP00, 3 )
                        Add ( \_SB.MSU, 1, \_SB.MSU )
                    }
                }
                ShiftRight ( Local1, 1, Local1 )
                And ( Local1, 1, Local2 )
                If ( LNotEqual(Local2, \_SB.VP01.FLG) ) {
                    Store ( Local2, \_SB.VP01.FLG )
                    If ( LEqual(Local2, 1) ) {
                        Notify ( VP01, 1 )
                        Subtract ( \_SB.MSU, 1, \_SB.MSU )
                    }
                    Else {
                        Notify ( VP01, 3 )
                        Add ( \_SB.MSU, 1, \_SB.MSU )
                    }
                }
                Return ( One )
            }
        }
        Device ( \_SB.GPEX ) {
            Name ( _HID, "ACPI0006" )
            Name ( _UID, "XENGPE" )
            Name ( _CRS, ResourceTemplate() {
                IO (Decode16, 0xafe0 , 0xafe0, 0x00, 0x4)
            } )
            Method ( _E02 ) {
                \_SB.PRSC ()
            }
        }
    }

Since the position of the XEN data memory area is not know, the hypervisor will
have to replace the address 0xdeadbeef with the actual memory address where
this structure has been copied. This will involve a memory search of the AML
code resulting from the compilation of the above ASL snippet.

This is also slightly risky.  If we need to do this, can we get a
relocation list from the compiled table from iasl?

~Andrew


In order to implement this, the hypervisor build is going to use part of
libacpi and the iasl compiler.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.