Xen project Mailing List

Re: [Xen-devel] arm64: Approach for DT based NUMA and issues

To: Andre Przywara <andre.przywara@xxxxxxx>

From: Vijay Kilari <vijay.kilari@xxxxxxxxx>

Date: Mon, 28 Nov 2016 20:35:05 +0530

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "prasun.kapoor" <prasun.kapoor@xxxxxxxxxx>

Delivery-date: Mon, 28 Nov 2016 15:05:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Nov 28, 2016 at 7:20 PM, Andre Przywara <andre.przywara@xxxxxxx> wrote: > Hi Vijay, > > On 26/11/16 06:59, Vijay Kilari wrote: >> Hi, >> >> Below basic write up on DT based NUMA feature support for arm64 platform. >> I have attempted to get NUMA support, However I face below issues. I would >> like >> to discuss these issues. Please let me know your comments on this. Yet to >> look >> at ACPI support. >> >> DT based NUMA support for arm64 platform >> ======================================== >> For Xen boot on NUMA arm64 platform, Xen needs to parse >> CPU and Memory nodes for DT based booting mechanism. Here I would >> like to discuss about DT based booting mechanism and the issues >> related to it. >> >> 1) Parsing CPU and Memory nodes: >> --------------------------------------------------- >> >> The numa information associated for CPU and Memory are passed in DT >> using numa-node-id u32-interger value. More information about NUMA binding >> is available in linux kernel @ Documentation/devicetree/bindings/numa.txt >> >> Similar to Linux kernel, cpu and memory nodes of DT are parsed >> and numa-node-id information is populated in cpu_parsed and memory_parsed >> node_t mask. >> >> When booting in UEFI mode, UEFI passes memory information to Dom0 >> using EFI memory descriptor table and deletes the memory nodes >> from the host DT. However to fetch the memory numa node id, memory DT >> node should not be deleted by EFI stub. > > So is this what the Cavium UEFI firmware actually does today? > I have been told that removing the DT memory nodes was the original idea > when UEFI was architected for ARM, but it's not clear whether this is > actually implemented. Also this may differ from platform to platform, I > guess. > I don't have easy access to a box, so can't check atm. Please see the patch from Ard in kernel. This change is required in Xen EFI as well. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/firmware/efi/arm-init.c?id=500899c2cc3e3f06140373b587a69d30650f2d9d > >> ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, >> Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ] >> which adds memory ranges to bootinfo.mem structure there by adding duplicate >> entry and eventually initialization fails. >> >> Possible Solution: While adding new memory region to bootinfo.mem, check for >> duplicate entries and back off if entry is already available from UEFI mem >> info >> table. > > So why do we iterate over DT nodes if we have populated via the UEFI > memmap already? Can't we just have an order: > 1) if UEFI memmap available: parse that, populate bootinfo.mem, ignore DT > 2) if UEFI not available, parse DT memory nodes, populate bootinfo.mem Yes, could be done. will have a look > > So to make this work with NUMA, we would add another chain for NUMA parsing: > 1) if ACPI is available, use the SRAT table > 2) if ACPI is not available, check the DT memory nodes > > This should work with all cases: pure DT, UEFI with DT, UEFI with ACPI > >> >> 2) Parsing CPU nodes: >> --------------------------------- >> The CPU nodes are parsed to extract numa-node-id info for each cpu and >> cpu_nodemask is populated. >> >> The MPIDR register value is read for each CPU and cpu_to_node[] is populated. > > So there is no issue here and that works as expected? No issue. Already MPIDR is read on secondary cpu boot from which cpu_to_node[] data is updated > >> 3) Parsing Memory nodes: >> -------------------------------------- >> For all the DT memory nodes in the flattend DT, start address, size >> and numa-node-id value is extracted and stored in "node_memblk_range[]" >> which is of type struct node. >> >> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and >> NODE_DATA is populated with start PFN, end PFN and nodeid. >> >> Populating memnodemap: >> >> The memnodemap[] is allocated from heap and using the NODE_DATA structure, >> the memnodemap[] is populated with nodeid for each page index. >> >> This memnodemap info is used to fetch memory node id for a given page >> by calling phys_to_nid() by memory allocator. >> >> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] >> is initialized. >> >> Since memnodemap[] is allocated from heap, and hence boot allocator should >> be initialized. The boot_allocator() needs phys_to_nid() which is not >> available untill memnodemap[] is initialized. So there is deadlock situation >> during initialization. To overcome this phsy_to_nid() should rely on >> node_memblk_range[] to get nodeid untill memnodemap[] is initialized. > > What about having an early boot fallback: like: > > nodeid_t phys_to_nid(paddr_t addr) > { > if (!memnodemap) > return 0; > .... > } The memory allocator has all the nodes memory from bootinfo.mem So, memory allocator fails when phys_to_nid() returns 0 for node 1 memory. > > Cheers, > Andre. > >> 4) Generating memory nodes for DOM0 >> --------------------------------------------------------- >> Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory >> from local memory node. So Dom0 needs to have memory allocated on all the >> available nodes of the system. >> >> Ex: SMMU driver of device on node 1 tries to allocate memory >> on node 1. >> >> ISSUE: >> - Dom0's memory should be split across all the available memory nodes >> of the system and memory nodes should be generated accordingly. >> - Memory DT node generated by Xen for Dom0 should populate numa-node-id >> information. >> >> Regards >> Vijay >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.