[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] arm64: Approach for DT based NUMA and issues
On 26/11/16 06:59, Vijay Kilari wrote: Hi, Hi Vijay, This mail is mixing two distinct problems: 1) Making Xen NUMA-aware 2) Make DOM0 NUMA-awareAs mentioned in another part of this thread, those problems should be one by one rather than together. I will focus on problem 1) while answering this e-mail. Below basic write up on DT based NUMA feature support for arm64 platform. I have attempted to get NUMA support, However I face below issues. I would like to discuss these issues. Please let me know your comments on this. Yet to look at ACPI support. DT based NUMA support for arm64 platform ======================================== For Xen boot on NUMA arm64 platform, Xen needs to parse CPU and Memory nodes for DT based booting mechanism. Here I would like to discuss about DT based booting mechanism and the issues related to it. 1) Parsing CPU and Memory nodes: --------------------------------------------------- The numa information associated for CPU and Memory are passed in DT using numa-node-id u32-interger value. More information about NUMA binding is available in linux kernel @ Documentation/devicetree/bindings/numa.txt Similar to Linux kernel, cpu and memory nodes of DT are parsed and numa-node-id information is populated in cpu_parsed and memory_parsed node_t mask. When booting in UEFI mode, UEFI passes memory information to Dom0 using EFI memory descriptor table and deletes the memory nodes from the host DT. However to fetch the memory numa node id, memory DT node should not be deleted by EFI stub. ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT, Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ] which adds memory ranges to bootinfo.mem structure there by adding duplicate entry and eventually initialization fails. Possible Solution: While adding new memory region to bootinfo.mem, check for duplicate entries and back off if entry is already available from UEFI mem info table. I think we should have a different approach. I actually like the approach suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e bootinfo.mem is already filled), then DT is only used to get NUMA node information. 2) Parsing CPU nodes: --------------------------------- The CPU nodes are parsed to extract numa-node-id info for each cpu and cpu_nodemask is populated. The MPIDR register value is read for each CPU and cpu_to_node[] is populated. To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR. They can be different and Xen does not have a clue of the MPIDR except in very few places. 3) Parsing Memory nodes: -------------------------------------- For all the DT memory nodes in the flattend DT, start address, size and numa-node-id value is extracted and stored in "node_memblk_range[]" which is of type struct node. Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and NODE_DATA is populated with start PFN, end PFN and nodeid. Populating memnodemap: The memnodemap[] is allocated from heap and using the NODE_DATA structure, the memnodemap[] is populated with nodeid for each page index. This memnodemap info is used to fetch memory node id for a given page by calling phys_to_nid() by memory allocator. ISSUE: phys_to_nid() is called by memory allocator before memnodemap[] is initialized. Since memnodemap[] is allocated from heap, and hence boot allocator should be initialized. The boot_allocator() needs phys_to_nid() which is not available untill memnodemap[] is initialized. So there is deadlock situation during initialization. To overcome this phsy_to_nid() should rely on node_memblk_range[] to get nodeid untill memnodemap[] is initialized. Looking at the code, boot_allocator() does not need phys_to_nid until the end. So it would be perfectly fine to use alloc_boot_pages to allocate memnodemap. 4) Generating memory nodes for DOM0 --------------------------------------------------------- Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory from local memory node. So Dom0 needs to have memory allocated on all the available nodes of the system. Ex: SMMU driver of device on node 1 tries to allocate memory on node 1. ISSUE: - Dom0's memory should be split across all the available memory nodes of the system and memory nodes should be generated accordingly. - Memory DT node generated by Xen for Dom0 should populate numa-node-id information. If you drop numa-node-id property from every node, DOM0 will not try to use NUMA. Is there any specific reason to not do that? Those properties could be re-introduced later on when vNUMA will be brought up. Regards,[1] https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |