[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] arm64: Approach for DT based NUMA and issues



On Tue, Nov 29, 2016 at 12:29 AM, Julien Grall <julien.grall@xxxxxxx> wrote:
>
>
> On 26/11/16 06:59, Vijay Kilari wrote:
>>
>> Hi,
>
>
> Hi Vijay,
>
> This mail is mixing two distinct problems:
>         1) Making Xen NUMA-aware
>         2) Make DOM0 NUMA-aware
>
> As mentioned in another part of this thread, those problems should be one by
> one rather than together.
>
> I will focus on problem 1) while answering this e-mail.
>
>
>>    Below basic write up on DT based NUMA feature support for arm64
>> platform.
>> I have attempted to get NUMA support, However I face below issues. I would
>> like
>> to discuss these issues. Please let me know your comments on this. Yet to
>> look
>> at ACPI support.
>>
>> DT based NUMA support for arm64 platform
>> ========================================
>> For Xen boot on NUMA arm64 platform, Xen needs to parse
>> CPU and Memory nodes for DT based booting mechanism. Here I would
>> like to discuss about DT based booting mechanism and the issues
>> related to it.
>>
>> 1) Parsing CPU and Memory nodes:
>> ---------------------------------------------------
>>
>> The numa information associated for CPU and Memory are passed in DT
>> using numa-node-id u32-interger value. More information about NUMA binding
>> is available in linux kernel @ Documentation/devicetree/bindings/numa.txt
>>
>> Similar to Linux kernel, cpu and memory nodes of DT are parsed
>> and numa-node-id information is populated in cpu_parsed and memory_parsed
>> node_t mask.
>>
>> When booting in UEFI mode, UEFI passes memory information to Dom0
>> using EFI memory descriptor table and deletes the memory nodes
>> from the host DT. However to fetch the memory numa node id, memory DT
>> node should not be deleted by EFI stub.
>>
>> ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT,
>> Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node()
>> ]
>> which adds memory ranges to bootinfo.mem structure there by adding
>> duplicate
>> entry and eventually initialization fails.
>>
>> Possible Solution: While adding new memory region to bootinfo.mem, check
>> for
>> duplicate entries and back off if entry is already available from UEFI mem
>> info
>> table.
>
>
> I think we should have a different approach. I actually like the approach
> suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e
> bootinfo.mem is already filled), then DT is only used to get NUMA node
> information.
>
>>
>> 2) Parsing CPU nodes:
>> ---------------------------------
>> The CPU nodes are parsed to extract numa-node-id info for each cpu and
>> cpu_nodemask is populated.
>>
>> The MPIDR register value is read for each CPU and cpu_to_node[] is
>> populated.
>
>
> To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR.
> They can be different and Xen does not have a clue of the MPIDR except in
> very few places.
>
>>
>> 3) Parsing Memory nodes:
>> --------------------------------------
>> For all the DT memory nodes in the flattend DT, start address, size
>> and numa-node-id value is extracted and stored in "node_memblk_range[]"
>> which is of type struct node.
>>
>> Each bootinfo.mem entry from UEFI is verified against node_memblk_range[]
>> and
>> NODE_DATA is populated with start PFN, end PFN and nodeid.
>>
>> Populating memnodemap:
>>
>> The memnodemap[] is allocated from heap and using the NODE_DATA structure,
>> the memnodemap[] is populated with nodeid for each page index.
>>
>> This memnodemap info is used to fetch memory node id for a given page
>> by calling phys_to_nid() by memory allocator.
>>
>> ISSUE: phys_to_nid() is called by memory allocator before memnodemap[]
>> is initialized.
>>
>> Since memnodemap[] is allocated from heap, and hence boot allocator should
>> be initialized. The boot_allocator() needs phys_to_nid() which is not
>> available untill memnodemap[] is initialized. So there is deadlock
>> situation
>> during initialization. To overcome this phsy_to_nid() should rely on
>> node_memblk_range[] to get nodeid untill memnodemap[] is initialized.
>
>
> Looking at the code, boot_allocator() does not need phys_to_nid until the
> end. So it would be perfectly fine to use alloc_boot_pages to allocate
> memnodemap.
>
>>
>> 4) Generating memory nodes for DOM0
>> ---------------------------------------------------------
>> Linux kernel device drivers that uses devm_zalloc(), tries to allocate
>> memory
>> from local memory node. So Dom0 needs to have memory allocated on all the
>> available nodes of the system.
>>
>> Ex: SMMU driver of device on node 1 tries to allocate memory
>> on node 1.
>>
>> ISSUE:
>>  - Dom0's memory should be split across all the available memory nodes
>>    of the system and memory nodes should be generated accordingly.
>>  - Memory DT node generated by Xen for Dom0 should populate numa-node-id
>>    information.
>
>
> If you drop numa-node-id property from every node, DOM0 will not try to use
> NUMA. Is there any specific reason to not do that?

If we drop numa-node-id from memory node generated to dom0, then dom0 will
assume all the memory is from node0. So eventually node1 device
intialization fails.

>
> Those properties could be re-introduced later on when vNUMA will be brought
> up.
>
> Regards,
>
> [1]
> https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.