Xen project Mailing List

Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > % Heterogeneous Multi Processing Support in Xen > % Revision 1 > > \clearpage > > # Basics > > ---------------- ------------------------ > Status: **Design Document** > > Architecture(s): x86, arm > > Component(s): Hypervisor and toolstack > ---------------- ------------------------ > > # Overview > > HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing) > refer to systems where physical CPUs are not exactly equal. It may be that > they have different processing power, or capabilities, or that each is > specifically designed to run a particular system component. > Most of the times the CPUs have different Instruction Set Architectures (ISA) > or Application Binary Interfaces (ABIs). But they may *just* be different > implementations of the same ISA, in which case they typically differ in > speed, power efficiency or handling of special things (e.g., erratas). > > An example is ARM big.LITTLE, which in fact, is the use case that got the > discussion about HMP started. This document, however, is generic, and does > not target only big.LITTLE. > > What need proper Xen support are systems and use cases where virtual CPUs > can not be seamlessly moved around all the physical CPUs. In fact, in these > cases, there must be a way to: > > * decide and specify on what (set of) physical CPU(s), each vCPU can execute > on; > * enforce that a vCPU that can only run on a certain (set of) pCPUs, is never > actually run anywhere else. > > **N.B.:** it is becoming common to refer as AMP or HMP also to systems which > have various kind of co-processors (from crypto engines to graphic hardware), > integrated with the CPUs on the same chip. This is not what this design > document is about. > > # Classes of CPUs > > A *class of CPUs* is defined as follows: > > 1. each pCPU in the system belongs to a class; > 2. a class can consist of one or more pCPUs; > 3. each pCPU can only be in one class; > 4. CPUs belonging to the same class are homogeneous enough that a virtual > CPU that blocks/is preempted while running on a pCPU of a class can, > **seamlessly**, unblock/be scheduler on any pCPU of that same class; > 5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it > means that the vCPU can run on all the pCPUs belonging to the said > class(es). > > So, for instance, in architecture Foobar two classes of CPUs exist, class > foo and class bar. If a virtual CPU running on a CPU 0, which is of class > foo, blocks (or is preempted), it can, when it unblocks (or is selected by > the scheduler to run again), run on CPU 3, still of class foo, but not on > CPU 6, which is of class bar. > > ## Defining classes > > How a class is defined, i.e., what are the specific characteristics that > determine what CPUs belong to which class, is highly architecture specific. > > ### x86 > > There is no HMP platform of relevance, for now, in x86 world. Therefore, > only one class will exist, and all the CPUs will be set to belong to it. > **TODO X86:** is this correct? > > ### ARM > > **TODO ARM:** I know nothing about what specifically should be used to > form classes, so I'm deferring this to ARM people. > > So far, in the original thread the following ideas came up (well, there's > more, but I don't know enough of ARM to judge what is really relevant about > this topic): > > * > [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html) > "I don't think an hardcoded list of processor in Xen is the right solution. > There are many existing processors and combinations for big.LITTLE so it > will nearly be impossible to keep updated." > * > [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html) > "Well, before trying to do something clever like that (i.e naming "big" and > "little"), we need to have upstreamed bindings available to acknowledge the > difference. AFAICT, it is not yet upstreamed for Device Tree and I don't > know any static ACPI tables providing the similar information." > * > [Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html) > "For how to differentiate cpus, I am looking the linaro eas cpu topology > code" > > # User details > > ## Classes of CPUs for the users > > It will be possible, in a VM config file, to specify the (set of) class(es) > of each vCPU. This allows creating HMP VMs. > > E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on > big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel > and tools. > > For such purpose, a new option will be added to xl config file: > > vcpus = "8" > vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", > "8:class4"] > > with the following meaning: > > * vCPUs 0, 1, 2 can only run on pcpus of class class0 > * vCPUs 3, 4 can run on pcpus of class class1 **and** on pcpus of class class3 > * vCPUs 5 can run on pcpus of class class0 **and** on pCPUs of class class2 > * for vCPUs 7, since they're not mentioned, default applies > * vCPUs 8 can only run on pcpus of class class4 > > For the vCPUs for which no class is specified, default behavior applies. > > **TODO:** note that I think it must be possible to associate more than > one class to a vCPU. This is expressed in the example above, and assumed > to be true throughout the document. It might be, though, that, at least at > early stages (see implementation phases below), we will enable only 1-to-1 > mapping. > > **TODO:** default can be, either: > > 1. the vCPU can run on any CPU of any class, > 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say > that should be class 0). > > The former seems a better interface. It looks to me like the most natural > and less surprising, from the user point of view, and the most future proof > (see phase 3 of implementation below). > The latter may be more practical, though. In fact, with the former, we risk > crashing (the guest or the hypervisor) if one creates a VM and forgets to > specify the vCPU classes --which does not look ideal. > > It will be possible to gather information about what classes exist, and what > pCPUs belong to each class, by issuing the `xl info -n' command: > > cpu_topology : > cpu: core socket node class > 0: 0 1 0 0 > 1: 0 1 0 1 > 2: 1 1 0 2 > 3: 1 1 0 3 > 4: 9 1 0 3 > 5: 9 1 0 0 > 6: 10 1 0 1 > 7: 10 1 0 2 > 8: 0 0 1 3 > 9: 0 0 1 3 > 10: 1 0 1 1 > 11: 1 0 1 0 > 12: 9 0 1 1 > 13: 9 0 1 0 > 14: 10 0 1 2 > 15: 10 0 1 2 > > **TODO:** do we want to keep using `-n`, or add another switch, like -c or > something? I'm not sure I like using `-n` as, e.g., on x86, this would most > of the times result in just a column full of `0`, and it may raise confusion > among users about what that actually means. > Also, do we want to print the class ids, or some more abstract class names? > (or support both, and have a way to decide which one to see)? > > # Technical details > > ## Hypervisor > > The hypervisor needs to know within which class each of the present CPUs > falls. At boot (or, in general, CPU bringup) time, while identifying the CPU, > a list of classes is constructed, and the mapping between each CPU and the > class it is determined it should belong, established. > > The list of classes is kept ordered from the more powerful to the less > powerful. > **TODO:** this has been [proposed by > George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html). > I like the idea, what do others think? If we agree on that, note that there > has been no discussion on defining what "more powerful" means, neither on > x86 (although, not really that interesting, for now, I'd say), nor on ARM. > > The mapping between CPUs and classes will be kept in memory in the following > data structures: > > uint16_t cpu_to_class[NR_CPUS] __read_mostly; > cpumask_t class_to_cpumask[NR_CPUS] __read_mostly; > > **TODO:** it's probably better to allocate the cpumask array dynamically, > to avoid wasting too much space. > > **TODO:** if we want the ordering, structure needs to be kept ordered too > (or additional structures should be used for the purpose). > > Each virtual CPU must know on what class of CPUs it can run on. Since a > vCPU can be associated to more than one class, the best way to keep track > of this information is a bitamp. That will be a new `cpumask` typed member > in `struct vcpu`. were the i-eth bit set means the vCPU can > run on CPUs of class i. > > If a vCPU is found running on a pCPU of a class that is not associated to > the vCPU itself, an exception should be raised. > **TODO:** What kind? BUG_ON? Crash the guest? The guest would probably crash > --or become unreliable-- by its own, I guess. > > Setting and getting the CPU class of a vCPU will happen via two new > hypercalls: > > * `XEN_DOMCTL_setvcpuclass` > * `XEN_DOMCTL_setvcpuclass` > > Information about CPU classes will be propagated to toolstak by adding a > new field in xen_sysctl_cputopo, which will become: > > struct xen_sysctl_cputopo { > uint32_t core; > uint32_t socket; > uint32_t node; > unit32_t class; > }; > > For homogeneous and SMP systems, the value of the new class field will > be 0 for all the cores. > > ## Toolstack > > It will be possible for the toolstack to retrieve from Xen the list of > existing CPU classes, their names, and the information about to which > class each present CPU belongs to. > > **TODO:** [George > suggested](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html) > to allow a richer set of labels, at the toolstack level, and I like > the idea very much. It's not clear to me, though, in what component > this list of names, and the mapping between them and the classes as > they're known inside Xen should live. > > Libxl and libxc interfaces will be introduced for associating a vCPU to > a (set of) class(es): > > * `libxl_set_vcpuclass()`, `libxl_get_vcpuclass()`; > * `xc_vcpu_setclass()`, `xc_vcpu_getclass()`. > > In libxl, class information will be added in `struct libxl_cputopology`, > which is filled by `libxl_get_cpu_topology()`. > > # Implementation > > Implementation can proceed in phases. > > ## Phase 1 > > Class definition, identification and mapping of CPUs to classes, inside > Xen, will be implemented. And so they will be libxc and libxl interfaces > for retrieving such information. > > Parsing of the new `vcpuclass` parameter will be implemented in `xl`. The > result of such parsing will then be used as if it were the hard-affinity of > the various vCPUs. That is, we will set the hard-affinity of each vCPU, to > the pCPUs that are part of the class(es) the vCPU itself is being assigned, > according to `vcpuclass`. > > This would *Just Work(TM)*, as soon as the user does not try to change the > hard-affinity, during the VM lifetime (e.g., with `xl vcpu-pin'). > > **TODO:** It may be useful, for avoiding the above to happen, to add another > `xl` config option that, if set, disallows changing the affinity from what it > was at VM creation time (something like `immutable_affinity=1`). Thoughts? > I'm leaning toward doing that, as it may even be something useful to have > in other usecases. > > ### Phase 1.5 > > Library (libxc and libxl) calls and hypercalls that are necessary to associate > a class to the vCPUs will be implemented. > > At which point, when parsing `vcpuclass` in `xl`, we will call both (with the > same bitmap as input): > > * `libxl_set_vcpuclass()` > * `libxl_set_vcpuaffinity()` > > `libxl__set_vcpuaffinity()` will be modified in such a way that, when setting > hard-affinity for a vCPU: > > * it will get the CPU class(es) associated to the vCPU; > * it will check what pCPUs that belong to the class(es); > * it will filter out, from the new hard-affinity being set, the pCPUs that > are not in the vCPU's class(es)'. > > As a safety measure, `vcpu_set_hard_affinity()` in Xen will also be modified > such that, if someone somehow manages to pass down an hard-affinity mask > which contains pCPUs outside from the proper classes, it will error out > with -EINVAL. > > ### Phase 2 > > Inside Xen, the various schedulers will be modified to deal internally with > the fact that vCPUs can only run on pCPUs from the class(es) they are > associated with. This allows for more efficient implementation, and paves > the way for enabling more intelligent logic (e.g., for minimizing power > consumption) in *phase 3*. > > Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer > necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be > called). > > ### Phase 3 > > Moving vCPUs between classes will be implemented. This means that, e.g., > on ARM big.LITTLE, it will be possible for a vCPU to block on a big core > and wakeup on a LITTLE core. > > **TODO:** About what this takes, see [Julien's > email](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02345.html). > > This means it will be no longer necessary to specify the class of the > vCPUs via `vcpuclass` in `xl`, although that will of course remain > supported. So: > > 1. if one wants (sticking with big.LITTLE as example) a big.LITTLE VM, > and wants to make sure that make sure that big vCPUs will run on big > pCPUs, and that LITTLE vCPUs will run on LITTLE pCPUs, she will use: > > vcpus = "8" > vcpuclass = ["0-3:big", "4-7:little"] > > 2. if one does not care, and is happy to let the Xen scheduler decide > where to run the various vCPUs, in order, for instance, to be sure > to get the best power efficiency for the host as a whole, he can > just avoid specifying any `vcpuclass`, or doing something like this: > > vcpuclass = ["all:all"] > > # Limitations > > * Until in *phase 1*, it won't be possible to use vCPU hard-affinity > for anything else than HMP support; > * until before *phase 3*, since HMP support is basically the same as > setting hard-affinity, performance may not be ideal; > * until before *phase 3*, vCPUs can't move between classes. This means. > for instance, in the big.LITTLE world, Xen's scheduler can't move a > vCPU running on a big core on a LITTLE core (e.g., to try save power). > > # Testing > > Testing requires an actual AMP/HMP system. On such a system, we at least > want to: > > * create a VM **without** specifying `vcpuclass` in its config file, and > check that the default policy is correctly applied to all vCPUs; > * create a VM **specifying** `vcpuclass` in its config file and check that > the classes are assegned to vCPUs appropriately; > * create a VM **specifying** `vcpuclass` in its config file and check that > the various vCPUs are not running on any pCPU outside of their respective > classes. > > # Areas for improvement > > * Make it possible to test even on non-HMP systems. That could be done by > making it possible to provide Xen with fake CPU classes for the system > CPUs (e.g., with boot time parameters); > * implement a way to view the class the vCPUs have been assigned (either as > past of the output of `xl vcpu-list`, or as a dedicated `xl` subcommand); > * make it possible to dynamically change the class of vCPUs at runtime, with > `xl` (either via a new parameter to `vcpu-pin` subcommand, or via a new > subcommand). > > # Known issues > > *TBD*. > > # References > > * [Asymetric Multi > Processing](https://en.wikipedia.org/wiki/Asymmetric_multiprocessing) > * [Heterogeneous Multi > Processing](https://en.wikipedia.org/wiki/Heterogeneous_computing) > * [ARM > big.LITTLE](https://www.arm.com/products/processors/technologies/biglittleprocessing.php) > > # History > > ------------------------------------------------------------------------ > Date Revision Version Notes > ---------- -------- -------- ------------------------------------------- > 2016-12-02 1 RFC of design document > ---------- -------- -------- ------------------------------------------- Hi all, We are sending a branch[1] for comments on an initial implementation of the above design document. Essentially it targets the ARM big.LITTLE architecture. It would be great if you guys could comment on the changes and provide some guidance for us to get it upstream. We have tested it on an odroid xu4 [2] and we are able to boot guests with mixed vcpu affinities (big and LITTLE). We are more than happy to submit patches once we address the issues and come up with a review-able version of this implementation. Thanks! A. [1] https://github.com/HPSI/xen/tree/big.LITTLE [2] using this cherry pick: 8d56205455a4a1e0233421d3ee98e3c7dee20bd2 from: https://github.com/bkrepo/xen.git _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.