[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH RFC v2 0/7] xen: vNUMA introduction



This series of patches introduces vNUMA topology awareness and
provides interfaces and data structures to enable vNUMA for 
PV domU guests.

vNUMA topology support should be supported by PV guest kernel. 
Corresponging patches should be applied.

Introduction
-------------

vNUMA topology is exposed to the PV guest to improve performance when running
workloads on NUMA machines.
XEN vNUMA implementation provides a way to create vNUMA-enabled guests on 
NUMA/UMA
and map vNUMA topology to physical NUMA in a optimal way.

XEN vNUMA support

Current set of patches introduces subop hypercall that is available for 
enlightened
PV guests with vNUMA patches applied.

Domain structure was modified to reflect per-domain vNUMA topology for use in 
other
vNUMA-aware subsystems (e.g. ballooning).

libxc

libxc provides interfaces to build PV guests with vNUMA support and in case of 
NUMA
machines provides initial memory allocation on physical NUMA nodes. This 
implemented by
utilizing nodemap formed by automatic NUMA placement. Details are in patch #3.

libxl

libxl provides a way to predefine in VM config vNUMA topology - number of 
vnodes,
memory arrangement, vcpus to vnodes assignment, distance map.

PV guest

As of now, only PV guest can take advantage of vNUMA functionality. vNUMA Linux 
patches
should be applied and NUMA support should be compiled in kernel.

Example of booting vNUMA enabled pv domU:

NUMA machine:
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       1        0        0
  2:       2        0        0
  3:       3        0        0
  4:       0        1        1
  5:       1        1        1
  6:       2        1        1
  7:       3        1        1
numa_info              :
node:    memsize    memfree    distances
   0:     17664      12243      10,20
   1:     16384      11929      20,10

VM config:

memory = 16384
vcpus = 8
name = "rcbig"
vnodes = 8
vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
vcpu_to_vnode ="5 6 7 4 3 2 1 0"


root@superpipe:~# xl list -n
Name                                        ID   Mem VCPUs  State   Time(s) 
NODE Affinity
Domain-0                                     0  4096     1     r-----     581.5 
any node
r9                                           1  2048     1     -b----      19.9 0
rc9k1                                        2  2048     6     -b----      21.1 
1
*rcbig                                        6 16384     8     -b----       
4.9 any node

xl debug-keys u:
XEN) Memory location of each domain:
(XEN) Domain 0 (total: 1048576):
(XEN)     Node 0: 510411
(XEN)     Node 1: 538165
(XEN) Domain 2 (total: 524288):
(XEN)     Node 0: 0
(XEN)     Node 1: 524288
(XEN) Domain 3 (total: 4194304):
(XEN)     Node 0: 2621440
(XEN)     Node 1: 1572864
(XEN)     Domain has 8 vnodes
(XEN)         pnode 0: vnodes: 0 (2048), 1 (2048), 2 (2048), 3 (2048), 4 
(2048), 
(XEN)         pnode 1: vnodes: 5 (2048), 6 (2048), 7 (2048), 
(XEN)    Domain vcpu to vnode: 5 6 7 4 3 2 1 0 


pv linux boot (domain 3):
[    0.000000] init_memory_mapping: [mem 0x00100000-0x37fffffff]
[    0.000000]  [mem 0x00100000-0x37fffffff] page 4k
[    0.000000] RAMDISK: [mem 0x01dd6000-0x0347dfff]
[    0.000000] vNUMA: memblk[0] - 0x0 0x80000000
[    0.000000] vNUMA: memblk[1] - 0x80000000 0x100000000
[    0.000000] vNUMA: memblk[2] - 0x100000000 0x180000000
[    0.000000] vNUMA: memblk[3] - 0x180000000 0x200000000
[    0.000000] vNUMA: memblk[4] - 0x200000000 0x280000000
[    0.000000] vNUMA: memblk[5] - 0x280000000 0x300000000
[    0.000000] vNUMA: memblk[6] - 0x300000000 0x380000000
[    0.000000] vNUMA: memblk[7] - 0x380000000 0x400000000
[    0.000000] NUMA: Initialized distance table, cnt=8
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x7fffffff]
[    0.000000]   NODE_DATA [mem 0x7ffd9000-0x7fffffff]
[    0.000000] Initmem setup node 1 [mem 0x80000000-0xffffffff]
[    0.000000]   NODE_DATA [mem 0xfffd9000-0xffffffff]
[    0.000000] Initmem setup node 2 [mem 0x100000000-0x17fffffff]
[    0.000000]   NODE_DATA [mem 0x17ffd9000-0x17fffffff]
[    0.000000] Initmem setup node 3 [mem 0x180000000-0x1ffffffff]
[    0.000000]   NODE_DATA [mem 0x1fffd9000-0x1ffffffff]
[    0.000000] Initmem setup node 4 [mem 0x200000000-0x27fffffff]
[    0.000000]   NODE_DATA [mem 0x27ffd9000-0x27fffffff]
[    0.000000] Initmem setup node 5 [mem 0x280000000-0x2ffffffff]
[    0.000000]   NODE_DATA [mem 0x2fffd9000-0x2ffffffff]
[    0.000000] Initmem setup node 6 [mem 0x300000000-0x37fffffff]
[    0.000000]   NODE_DATA [mem 0x37ffd9000-0x37fffffff]
[    0.000000] Initmem setup node 7 [mem 0x380000000-0x3ffffffff]
[    0.000000]   NODE_DATA [mem 0x3fdff7000-0x3fe01dfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x3ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009ffff]
[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
[    0.000000]   node   1: [mem 0x80000000-0xffffffff]
[    0.000000]   node   2: [mem 0x100000000-0x17fffffff]
[    0.000000]   node   3: [mem 0x180000000-0x1ffffffff]
[    0.000000]   node   4: [mem 0x200000000-0x27fffffff]
[    0.000000]   node   5: [mem 0x280000000-0x2ffffffff]
[    0.000000]   node   6: [mem 0x300000000-0x37fffffff]
[    0.000000]   node   7: [mem 0x380000000-0x3ffffffff]
[    0.000000] On node 0 totalpages: 524191
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3999 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 7112 pages used for memmap
[    0.000000]   DMA32 zone: 520192 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 524288
[    0.000000]   DMA32 zone: 7168 pages used for memmap
[    0.000000]   DMA32 zone: 524288 pages, LIFO batch:31
[    0.000000] On node 2 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 3 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 4 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 5 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 6 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 7 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] nr_irqs_gsi: 16
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.4-unstable (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:8 
nr_node_ids:8
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fc00000 s85120 r8192 
d21376 u2097152
[    0.000000] pcpu-alloc: s85120 r8192 d21376 u2097152 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7 
[    0.000000] Built 8 zonelists in Node order, mobility grouping on.  Total 
pages: 4136842

numactl withing running guest:
root@heatpipe:~# numactl --ha
available: 8 nodes (0-7)
node 0 cpus: 7
node 0 size: 2047 MB
node 0 free: 2001 MB
node 1 cpus: 6
node 1 size: 2048 MB
node 1 free: 2008 MB
node 2 cpus: 5
node 2 size: 2048 MB
node 2 free: 2010 MB
node 3 cpus: 4
node 3 size: 2048 MB
node 3 free: 2009 MB
node 4 cpus: 3
node 4 size: 2048 MB
node 4 free: 2009 MB
node 5 cpus: 0
node 5 size: 2048 MB
node 5 free: 1982 MB
node 6 cpus: 1
node 6 size: 2048 MB
node 6 free: 2008 MB
node 7 cpus: 2
node 7 size: 2048 MB
node 7 free: 1944 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  20  20  20  20  20  20  20 
  1:  20  10  20  20  20  20  20  20 
  2:  20  20  10  20  20  20  20  20 
  3:  20  20  20  10  20  20  20  20 
  4:  20  20  20  20  10  20  20  20 
  5:  20  20  20  20  20  10  20  20 
  6:  20  20  20  20  20  20  10  20 
  7:  20  20  20  20  20  20  20  10

root@heatpipe:~# numastat -c

Per-node numastat info (in MBs):
                Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
                ------ ------ ------ ------ ------ ------ ------ ------ -----
Numa_Hit            37     43     35     42     43     97     45     58   401
Numa_Miss            0      0      0      0      0      0      0      0     0
Numa_Foreign         0      0      0      0      0      0      0      0     0
Interleave_Hit       7      7      7      7      7      7      7      7    56
Local_Node          28     34     26     33     34     97     36     49   336
Other_Node           9      9      9      9      9      0      9      9    65

Patchset applies to latest Xen tree
commit e008e9119d03852020b93e1d4da9a80ec1af9c75 
Available at http://git.gitorious.org/xenvnuma/xenvnuma.git

Elena Ufimtseva (7):
  Xen vNUMA for PV guests.
  Per-domain vNUMA initialization.
  vNUMA nodes allocation on NUMA nodes.
  vNUMA libxl supporting functionality.
  vNUMA VM config parsing functions
  xl.cgf documentation update for vNUMA.
  NUMA debug-key additional output for vNUMA

 docs/man/xl.cfg.pod.5        |   50 +++++++++++
 tools/libxc/xc_dom.h         |    9 ++
 tools/libxc/xc_dom_x86.c     |   77 ++++++++++++++--
 tools/libxc/xc_domain.c      |   57 ++++++++++++
 tools/libxc/xenctrl.h        |    9 ++
 tools/libxc/xg_private.h     |    1 +
 tools/libxl/libxl.c          |   19 ++++
 tools/libxl/libxl.h          |   20 ++++-
 tools/libxl/libxl_arch.h     |    5 ++
 tools/libxl/libxl_dom.c      |  105 +++++++++++++++++++++-
 tools/libxl/libxl_internal.h |    3 +
 tools/libxl/libxl_types.idl  |    5 +-
 tools/libxl/libxl_x86.c      |   86 ++++++++++++++++++
 tools/libxl/xl_cmdimpl.c     |  205 ++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/numa.c          |   23 ++++-
 xen/common/domain.c          |   25 +++++-
 xen/common/domctl.c          |   68 +++++++++++++-
 xen/common/memory.c          |   56 ++++++++++++
 xen/include/public/domctl.h  |   15 +++-
 xen/include/public/memory.h  |    9 +-
 xen/include/xen/domain.h     |   11 +++
 xen/include/xen/sched.h      |    1 +
 xen/include/xen/vnuma.h      |   27 ++++++
 23 files changed, 869 insertions(+), 17 deletions(-)
 create mode 100644 xen/include/xen/vnuma.h

-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.