[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] arm64: Approach for DT based NUMA and issues



Hi,

   Below basic write up on DT based NUMA feature support for arm64 platform.
I have attempted to get NUMA support, However I face below issues. I would like
to discuss these issues. Please let me know your comments on this. Yet to look
at ACPI support.

DT based NUMA support for arm64 platform
========================================
For Xen boot on NUMA arm64 platform, Xen needs to parse
CPU and Memory nodes for DT based booting mechanism. Here I would
like to discuss about DT based booting mechanism and the issues
related to it.

1) Parsing CPU and Memory nodes:
---------------------------------------------------

The numa information associated for CPU and Memory are passed in DT
using numa-node-id u32-interger value. More information about NUMA binding
is available in linux kernel @ Documentation/devicetree/bindings/numa.txt

Similar to Linux kernel, cpu and memory nodes of DT are parsed
and numa-node-id information is populated in cpu_parsed and memory_parsed
node_t mask.

When booting in UEFI mode, UEFI passes memory information to Dom0
using EFI memory descriptor table and deletes the memory nodes
from the host DT. However to fetch the memory numa node id, memory DT
node should not be deleted by EFI stub.

ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT,
Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ]
which adds memory ranges to bootinfo.mem structure there by adding duplicate
entry and eventually initialization fails.

Possible Solution: While adding new memory region to bootinfo.mem, check for
duplicate entries and back off if entry is already available from UEFI mem info
table.

2) Parsing CPU nodes:
---------------------------------
The CPU nodes are parsed to extract numa-node-id info for each cpu and
cpu_nodemask is populated.

The MPIDR register value is read for each CPU and cpu_to_node[] is populated.

3) Parsing Memory nodes:
--------------------------------------
For all the DT memory nodes in the flattend DT, start address, size
and numa-node-id value is extracted and stored in "node_memblk_range[]"
which is of type struct node.

Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and
NODE_DATA is populated with start PFN, end PFN and nodeid.

Populating memnodemap:

The memnodemap[] is allocated from heap and using the NODE_DATA structure,
the memnodemap[] is populated with nodeid for each page index.

This memnodemap info is used to fetch memory node id for a given page
by calling phys_to_nid() by memory allocator.

ISSUE: phys_to_nid() is called by memory allocator before memnodemap[]
is initialized.

Since memnodemap[] is allocated from heap, and hence boot allocator should
be initialized. The boot_allocator() needs phys_to_nid() which is not
available untill memnodemap[] is initialized. So there is deadlock situation
during initialization. To overcome this phsy_to_nid() should rely on
node_memblk_range[] to get nodeid untill memnodemap[] is initialized.

4) Generating memory nodes for DOM0
---------------------------------------------------------
Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory
from local memory node. So Dom0 needs to have memory allocated on all the
available nodes of the system.

Ex: SMMU driver of device on node 1 tries to allocate memory
on node 1.

ISSUE:
 - Dom0's memory should be split across all the available memory nodes
   of the system and memory nodes should be generated accordingly.
 - Memory DT node generated by Xen for Dom0 should populate numa-node-id
   information.

Regards
Vijay

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.