[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology



Hi maintainers,

Just a ping. Welcome your feedbacks!

We wonder if the current RFC is an appropriate attempt towards the final
hybrid topology.

Thanks,
Zhao

On Thu, Nov 30, 2023 at 10:41:22PM +0800, Zhao Liu wrote:
> Date: Thu, 30 Nov 2023 22:41:22 +0800
> From: Zhao Liu <zhao1.liu@xxxxxxxxxxxxxxx>
> Subject: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology
> X-Mailer: git-send-email 2.34.1
> 
> From: Zhao Liu <zhao1.liu@xxxxxxxxx>
> 
> Hi list,
> 
> This series is our latest attempt after the previous RFC [1] about
> hybrid topology support, which is based on the commit 4705fc0c8511
> ("Merge tag 'pull-for-8.2-fixes-231123-1' of https://gitlab.com/
> stsquad/qemu into staging") with our previous cleanup (patches link:
> https://lore.kernel.org/all/20231127145611.925817-1-zhao1.liu@xxxxxxxxxxxxxxx/).
> 
> In the previous RFC, Daniel suggested [2] us to use the modern QOM
> approach to define CPU topology, and based on this way, defining hybrid
> topology through cli is natural. (Thanks Daniel!)
> 
> About why we chose -device other than -object, please see the chapter.3
> "History of QOM Topology".
> 
> In fact, S390x already implements heterogeneity at the CPU level with
> QOM CPUs, i.e., different CPUs have different entitlements [3]. However,
> for more thorough heterogeneity, i.e., heterogeneous cores, clusters,
> dies, caches and even more, we still need to go farther in the QOM
> direction.
> 
> With these background, we propose this series to implement QOM "smp"
> topology in QEMU, and it's also the first step towards the heterogeneous
> topology (including CPU & cache topology) for virtualization case.
> 
> The overall goal is to both use QOM "smp" topology to be compatible with
> current -smp behavior, and to take into account different architectural
> setups/requirements for CPU topology (even including different designs
> for CPU hotplug and possible_cpus[] implementation), and ultimately to
> extend QEMU's ability to define CPU topology via -device even without
> -smp.
> 
> The current remining issue, mainly related to PPC, as it chose to build
> the pssible_cpus[] list at core granularity. Please see chapter.5 "Open
> Questions" for more thoughts on this issue.
> 
> For other architectures that build possible_cpus[] at CPU granularity,
> the transition to QOM topology will be similar to what was done for i386
> in this patchset (please feel free to point out any issues I've missed).
> 
> There's a lot of work, and the devil is in the details.
> 
> 
> Welcome your feedbacks!
> 
> 
> 1. Summary about What We Did?
> =============================
> 
> This series implements the basics of QOM topology and supports QOM
> topology for the i386 as the example:
> 
> * Introduce the general topology device and abstract all CPU topology
>   levels to topology devices:
> 
>     - including "cpu", "cpu-core", "cpu-cluster", "cpu-die",
>       "cpu-socket", "cpu-book", "cpu-drawer", and a special topology
>       root "cpu-slot" to manage topology tree.
> 
> * Allow user to create "smp" CPU topology via "-device", for example:
> 
>     The topology with 8 CPUs:
> 
>     -accel kvm -cpu host \
>     -device cpu-socket,id=sock0 \
>     -device cpu-die,id=die0,parent=sock0 \
>     -device cpu-core,id=core0,parent=die0,nr-threads=2 \
>     -device cpu-core,id=core1,parent=die0,nr-threads=2,plugged-threads=1 \
>     -device cpu-core,id=core2,parent=die0,nr-threads=2,plugged-threads=2 \
>     -device cpu-core,id=core3,parent=die0,nr-threads=2 \
>     -device host-x86_64-cpu,socket-id=0,die-id=0,core-id=1,thread-id=1 \
> 
> * Build a topology tree under machine, for example:
> 
>     One of the above CPUs:
> 
>     {
>         "props": {
>             "core-id": 0,
>             "socket-id": 0,
>             "thread-id": 0
>         },
>         "qom-path": 
> "/machine/peripheral/cpu-slot/sock0/die0/core0/host-x86_64-cpu[0]",
>         "type": "host-x86_64-cpu",
>         "vcpus-count": 1
>     }
> 
> * Convert topology information configured via -smp into a QOM "smp"
>   topology tree (if the arch supports QOM topology).
> 
> 
> 2. What's the Problem?
> ======================
> 
> As computer architectures continue to innovate, the need for
> heterogeneous topology of virtual machine in QEMU is growing.
> 
> On the one hand, there is the need for heterogeneous CPU topology
> support. Heterogeneous CPU topology refers to systems that use more than
> one kind of processor or cores [4], the typical example is intel hybrid
> architecture with 2 core types [5], which will even have heterogeneous
> die topology [6]. And we can see that not only Intel, but also ARM and
> AMD's heterogeneous platforms will have the ability to support
> virtualization.
> 
> On the other hand, there is also a growing need for heterogeneous cache
> topology in QEMU. Not only will Intel's hybrid platforms introduce
> complex heterogeneous cache topology, but more platforms have aslo seen
> the strong demand for heterogeneous cache topology definition (e.g. ARM
> [7]). Although cache topology is strictly speaking a separate topology
> from CPU topology, in terms of the actual processor build process and
> the current QEMU's way to define cache topology (e.g., i386 [8] & ARM
> [9]), cache topology is and should be dependent on CPU topology to be
> defined.
> 
> With this background of increasing interest in heterogeneous computing,
> we need a flexible way (for accel) to define a wide variety of CPU
> topologies in QEMU.
> 
> Obviously, -smp is not enough, so here we propose a way to define CPU
> topology based on -device, in the hope that this will greatly expand the
> flexibility of QEMU's CPU topology definition.
> 
> 
> 3. History of QOM Topology
> ==========================
> 
> The intent of QOM topology is not to stop at QOM CPUs, but to abstract
> more CPU topology levels.
> 
> In fact, it's not a new term.
> 
> Back in 2014, Fan proposed [10] a hierarchical tree based on QOM node,
> QOM sockets, QOM cores and QOM cpu. Andreas also propsed [11] socket/
> core/thread QOM model to create hierarchical topology in place. From
> the discussion at that time, the hierarchy topology tree representation
> was what people (Igor said [12]) wanted. However, this work was not
> continued.
> 
> Then Bharata abstracted [13] the cpu-core to support core granularity
> hotplug in spapr. Until now, the only user of cpu-core is still spapr.
> 
> Cpu-cluster was introduced by Luc [14] to organize CPUs in different
> containers to support GDB for TCG case. Though this abstraction was
> descripted as: "mainly an internal QEMU representation and does not
> necessarily match with the notion of clusters on the real hardware",
> its name and function actually make it easy to correspond to the
> physical cluster/smp cluster (the difference, of course, is that TCG
> cluster collects CPUs directly, while the cluster as a CPU topology
> level collect cores, but this difference is not the impediment gap to
> converting "TCG " cluster to "general" cluster).
> 
> As CPU architectures evolve, QEMU has supported more topology levels
> (clusters, die, book and drawer) for virtualization case, while the
> existing cpu-core and cpu-cluster abstractions become fragmented.
> 
> And now, the need for defining hybrid topology has prompted us to
> rethink the QOM topology.
> 
> Daniel suggested [2] to use -object interfaces to define CPU topology,
> and we absorbed his design idea. But in practice we found that -device
> looked like a better approach to take advantage of the current QOM CPU
> device, cpu-core device and cpu-cluster device.
> 
> 
> 4. Design Overview
> ==================
> 
> 4.1. General Topology Device
> ============================
> 
> We introduce a new topology device as the basic abstraction, then all
> levels of the CPU topology are derived from this general topology device
> type.
> 
> This topology device is the basic unit for building the topology tree.
> Children topology devices are inserted into the queue of the parent, and
> the child<> property is created between the children and their parent.
> 
> As the root of the topology tree, we introduce a special topology device
> "cpu-slot". It is created by the machine at machine's initialization and
> collects the topology devices created by the user from the cli, and thus
> builds the topology tree.
> 
> The cpu-slot is also responsible for statistics on global topology
> information, and whenever there is a new topology child, the cpu-slot as
> root is notified to update topology informantion. In addition, different
> architectures have different requirements for topology (e.g., support
> for different levels), and such limitations/properties are applied to
> cpu-slot, which is checked when adding the new child topology unit in
> topology tree.
> 
> 
> 4.2. Derived QOM Topology Devices
> =================================
> 
> Based on the new general topology device type, we convert CPU, cpu-core
> and cpu-cluster from general devices to topology devices.
> 
> And we also abstract cpu-die, cpu-socket, cpu-book and cpu-drawer as
> topology devices.
> 
> 
> 4.3. New Device Category "DEVICE_CATEGORY_CPU_DEF"
> ==================================================
> 
> The topology devices can be divided into two general categories:
> 
> * One, as the basic topology components, should be created before board
>   initialization, to predefine the basic topology structure for the
>   system, and is used to initialize MachineState.possible_cpus at
>   machine's initialization. This category doesn't support hotplug, such
>   as:
> 
>     cpu-core (non-PPC core), cpu-cluster, cpu-die, cpu-socket, cpu-book
>     and cpu-drawer.
> 
>   Thus, we introduce the new device category "DEVICE_CATEGORY_CPU_DEF"
>   to mark these devices and create them from cli before board
>   initialization.
> 
> * The other are CPU and PPC core, which are the granularity of
>   MachineState.possible_cpus.
> 
>   They're created from MachineState.possible_cpus in place during
>   machine initialization or are plugged into MachineState.possible_cpus
>   through hotplug way.
> 
>   For these devices, they could be created from cli only after board
>   initialization.
> 
> 
> 4.4. User-child Interface to Build Child<> from Cli
> ===================================================
> 
> Topology device is bus-less device, and needs child<> to build topology
> hierarchical relationship like:
> 
> /machine/peripheral/cpu-slot/sock*/die*/core*/cpu*
> 
> Therefore, we introduce a new user-child interface to insert hooks into
> device_add path to get/specify object parent for topology devices.
> 
> If a topology device specify "parent" option in -device, it will be add
> to the corresponding topology parent with child<> property.
> 
> If no "parent" option, the topology device will have the default parent
> "cpu-slot". This ensures cpu-slot could collect all topology units to
> build complete topology tree.
> 
> 
> 5. Open Questions
> =================
> 
> There's a special case, user could define topology via -device without
> -smp (that's the future hybrid topology case!).
> 
> In the design of current QOM topology, the numbers of maximum CPUs and
> pre plugged CPUs could be collected during core devices realize.
> 
> For the (non-PPC) architectures which build possible_cpus[] at CPU
> granularity, the cores will be created before possible_cpus[]
> initialization and then CPU slot could know how many maximum CPUs will
> be supported to fill possible_cpus[].
> 
> But for PPC, the possible_cpus[] is at core granularity and PPC core
> could only be created after possible_cpus[] initialization, so that
> CPU slot cannot know the the numbers of maximum CPUs (PPC cores) and pre
> plugged CPUs (PPC cores). So for PPC, the "-smp" is necessary and cannot
> be omitted.
> 
> For PPC this potential impact, i.e., even though QOM topology is
> supported in PPC, it is not possible to omit -smp to create the topology
> only via -device as for i386, and since PPC does not currently support
> heterogeneous topology, this potential impact might be acceptable?
> 
> 
> 6. Future TODOs
> ===============
> 
> The current QOM topology RFC is only the very first step to introduce
> the most basic QOM support, and it tries to be as compatible as possible
> with existing SMP facilities.
> 
> The ultimate goal is to completely replace the current smp-related
> topology structures with cpu-slot.
> 
> There are many TODOs:
> 
> * Add unit tests.
> * Support QOM topology for all architectures.
> * Get rid of MachineState.smp and MachineClass.smp_props with cpu-slot.
> * Extend QOM topology to hybrid topology.
> * Introduce "-device-set" which is derived from Daniel's "-object-set"
>   idea [2] to create multiple duplicate devices.
> ...
> 
> 
> 7. Patch Summary
> ================
> 
> Patch  1- 3: Create DEVICE_CATEGORY_CPU_DEF devices before board
>              initialization.
> Patch  4- 7: Support child<> creation from cli.
> Ptach  8-12: Introduce general topology device.
> Patch 13-26: Abstract all topology levels to topology devices.
> Patch 27-34: Introduce cpu-slot to manage the CPU topology of machine.
> Patch 35-41: Convert i386's CPU creation & hotplug to be based on QOM
>              topology.
> 
> 
> 8. Reference
> ============
> 
> [1]:  Hybrid topology RFC:
>       https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg03205.html
> [2]:  Daniel's suggestion about QOM topology:
>       https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg03320.html
> [3]:  S390x topology document (by Nina):
>       https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg04842.html
> [4]:  Heterogeneous computing:
>       https://en.wikipedia.org/wiki/Heterogeneous_computing
> [5]:  12th gen’s Intel hybrid technology:
>       
> https://www.intel.com/content/www/us/en/support/articles/000091896/processors.html
> [6]:  Intel Meteor Lake (14th gen) architecture overview:
>       
> https://www.intel.com/content/www/us/en/content-details/788851/meteor-lake-architecture-overview.html
> [7]:  Need of ARM heterogeneous cache topology (by Yanan):
>       https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg05139.html
> [8]:  Cache topology implementation for i386:
>       https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg08251.html
> [9]:  Cluster for ARM to define shared L2 cache and L3 tag (by Yanan):
>       
> https://lore.kernel.org/all/20211228092221.21068-1-wangyanan55@xxxxxxxxxx/
> [10]: [PATCH v1 0/4] prebuild cpu QOM tree /machine/node/socket/core
>       ->link-cpu (by Fan):
>       
> https://lore.kernel.org/all/cover.1395217538.git.chen.fan.fnst@xxxxxxxxxxxxxx/
> [11]: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling,
>       part 1 (by Andreas):
>       
> https://lore.kernel.org/all/1427131923-4670-1-git-send-email-afaerber@xxxxxxx/
> [12]: "Now we want to have similar QOM tree for introspection which
>       helps express topology as well" (by Igor):
>       
> https://lore.kernel.org/all/20150407170734.51faac90@igors-macbook-pro.local/
> [13]: [for-2.7 PATCH v3 06/15] cpu: Abstract CPU core type (by Bharata)
>       
> https://lore.kernel.org/all/1463024905-28401-7-git-send-email-bharata@xxxxxxxxxxxxxxxxxx/
> [14]: [PATCH v8 01/16] hw/cpu: introduce CPU clusters (by Luc):
>       
> https://lore.kernel.org/all/20181207090135.7651-2-luc.michel@xxxxxxxxxxxxx/
> 
> Thanks and Best Regards,
> Zhao
> 
> ---
> Zhao Liu (41):
>   qdev: Introduce new device category to cover basic topology device
>   qdev: Allow qdev_device_add() to add specific category device
>   system: Create base category devices from cli before board
>     initialization
>   qom/object: Introduce helper to resolve path from non-direct parent
>   qdev: Set device parent and id after setting properties
>   qdev: Introduce user-child interface to collect devices from -device
>   qdev: Introduce parent option in -device
>   hw/core/topo: Introduce CPU topology device abstraction
>   hw/core/topo: Support topology index for topology device
>   hw/core/topo: Add virtual method to update topology info for parent
>   hw/core/topo: Add virtual method to check topology child
>   hw/core/topo: Add helpers to traverse the CPU topology tree
>   hw/core/cpu: Convert CPU from general device to topology device
>   PPC/ppc-core: Offload core-id to PPC specific core abstarction
>   hw/cpu/core: Allow to configure plugged threads for cpu-core
>   PPC/ppc-core: Limit plugged-threads and nr-threads to be equal
>   hw/cpu/core: Convert cpu-core from general device to topology device
>   hw/cpu/cluster: Rename CPUClusterState to CPUCluster
>   hw/cpu/cluster: Wrap TCG related ops and props into CONFIG_TCG
>   hw/cpu/cluster: Descript cluster is not only used for TCG in comment
>   hw/cpu/cluster: Allow cpu-cluster to be created by -device
>   hw/cpu/cluster: Convert cpu-cluster from general device to topology
>     device
>   hw/cpu/die: Abstract cpu-die level as topology device
>   hw/cpu/socket: Abstract cpu-socket level as topology device
>   hw/cpu/book: Abstract cpu-book level as topology device
>   hw/cpu/drawer: Abstract cpu-drawer level as topology device
>   hw/core/slot: Introduce CPU slot as the root of CPU topology
>   hw/core/slot: Maintain the core queue in CPU slot
>   hw/core/slot: Statistics topology information in CPU slot
>   hw/core/slot: Check topology child to be added under CPU slot
>   hw/machine: Plug cpu-slot into machine to maintain topology tree
>   hw/machine: Build smp topology tree from -smp
>   hw/machine: Validate smp topology tree without -smp
>   hw/core/topo: Implement user-child to collect topology device from cli
>   hw/i386: Make x86_cpu_new() private in x86.c
>   hw/i386: Allow x86_cpu_new() to specify parent for new CPU
>   hw/i386: Allow i386 to create new CPUs from QOM topology
>   hw/i386: Wrap apic id and topology sub ids assigning as helpers
>   hw/i386: Add the interface to search parent for QOM topology
>   hw/i386: Support QOM topology
>   hw/i386: Cleanup non-QOM topology support
> 
>  MAINTAINERS                        |  16 +
>  accel/kvm/kvm-all.c                |   4 +-
>  gdbstub/system.c                   |   2 +-
>  hw/core/cpu-common.c               |  25 +-
>  hw/core/cpu-slot.c                 | 605 +++++++++++++++++++++++++++++
>  hw/core/cpu-topo.c                 | 399 +++++++++++++++++++
>  hw/core/machine-smp.c              |   9 +
>  hw/core/machine.c                  |  10 +
>  hw/core/meson.build                |   2 +
>  hw/cpu/book.c                      |  46 +++
>  hw/cpu/cluster.c                   |  50 ++-
>  hw/cpu/core.c                      |  72 ++--
>  hw/cpu/die.c                       |  46 +++
>  hw/cpu/drawer.c                    |  46 +++
>  hw/cpu/meson.build                 |   2 +-
>  hw/cpu/socket.c                    |  46 +++
>  hw/i386/x86.c                      | 319 ++++++++++-----
>  hw/net/virtio-net.c                |   2 +-
>  hw/ppc/meson.build                 |   1 +
>  hw/ppc/pnv.c                       |   6 +-
>  hw/ppc/pnv_core.c                  |  17 +-
>  hw/ppc/ppc_core.c                  | 102 +++++
>  hw/ppc/spapr.c                     |  28 +-
>  hw/ppc/spapr_cpu_core.c            |  19 +-
>  hw/usb/xen-usb.c                   |   3 +-
>  hw/xen/xen-legacy-backend.c        |   2 +-
>  include/hw/arm/armsse.h            |   2 +-
>  include/hw/arm/xlnx-versal.h       |   4 +-
>  include/hw/arm/xlnx-zynqmp.h       |   4 +-
>  include/hw/boards.h                |  13 +
>  include/hw/core/cpu-slot.h         | 108 +++++
>  include/hw/core/cpu-topo.h         | 111 ++++++
>  include/hw/core/cpu.h              |   8 +-
>  include/hw/cpu/book.h              |  38 ++
>  include/hw/cpu/cluster.h           |  51 ++-
>  include/hw/cpu/core.h              |  26 +-
>  include/hw/cpu/die.h               |  38 ++
>  include/hw/cpu/drawer.h            |  38 ++
>  include/hw/cpu/socket.h            |  38 ++
>  include/hw/i386/x86.h              |   5 +-
>  include/hw/ppc/pnv_core.h          |  11 +-
>  include/hw/ppc/ppc_core.h          |  58 +++
>  include/hw/ppc/spapr_cpu_core.h    |  12 +-
>  include/hw/qdev-core.h             |   1 +
>  include/hw/riscv/microchip_pfsoc.h |   4 +-
>  include/hw/riscv/sifive_u.h        |   4 +-
>  include/monitor/qdev.h             |   7 +-
>  include/monitor/user-child.h       |  57 +++
>  include/qom/object.h               |  26 ++
>  qom/object.c                       |  31 ++
>  system/meson.build                 |   1 +
>  system/qdev-monitor.c              | 141 ++++++-
>  system/user-child.c                |  72 ++++
>  system/vl.c                        |  53 ++-
>  target/i386/cpu.c                  |   4 +
>  tests/unit/meson.build             |   5 +-
>  56 files changed, 2607 insertions(+), 243 deletions(-)
>  create mode 100644 hw/core/cpu-slot.c
>  create mode 100644 hw/core/cpu-topo.c
>  create mode 100644 hw/cpu/book.c
>  create mode 100644 hw/cpu/die.c
>  create mode 100644 hw/cpu/drawer.c
>  create mode 100644 hw/cpu/socket.c
>  create mode 100644 hw/ppc/ppc_core.c
>  create mode 100644 include/hw/core/cpu-slot.h
>  create mode 100644 include/hw/core/cpu-topo.h
>  create mode 100644 include/hw/cpu/book.h
>  create mode 100644 include/hw/cpu/die.h
>  create mode 100644 include/hw/cpu/drawer.h
>  create mode 100644 include/hw/cpu/socket.h
>  create mode 100644 include/hw/ppc/ppc_core.h
>  create mode 100644 include/monitor/user-child.h
>  create mode 100644 system/user-child.c
> 
> -- 
> 2.34.1
> 
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.