[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: Skip memory nodes if not enabled


  • To: Julien Grall <julien@xxxxxxx>, Leo Yan <leo.yan@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Michal Orzel <michal.orzel@xxxxxxx>
  • Date: Wed, 27 Sep 2023 14:49:23 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=xen.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=krWSpJxCG15kfLt9wnpPwPMoHEqbTjBN/EwRJsLeLiQ=; b=QCZ1mw+2JNMu3IL6T8+ypbnz/kXuiImaINgzzarndRDh98tDm3Gufr6b3tuB/iXc5aFKfWW+0O9BD3GcYFwu8F5KoHeQb2NoPgQ3Ue0/NVyQHY3bKVJGbx2Omkznpk9QcXDch26aVTww+O+BOWoibgIQzhLCDiVpjX7IG8lnJ4m/JTVtAAVHqcjLP9p1eNDLH6TMbYaYAqArIEm9OoHj07D4XlcJXYQ98A7HitvKwHm+1LOmsVxhdHJ4X7D5+RhTruGmMEVqTeWMY2UbUt/hybpBK/cpNfSHE4zrulm2serGyJP6a+pM3cwhqZFXqlcLBr79bI0UvsIFFEsRDr2Mtg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aHjnPgu3NZNR30spJ9vuBjUs8w0FYxXSWwLtlGdLJM2Pi5XdkLBIwO9GhhkQPPgcPs2f6wTWt9gOBQkp8BFfD2uo17skIPJtdrFIfEGdfH/ZIs+z68iHGq1rOakFlvqCGUNhXpPHZJ5ivg8ELllvZAgvY3QKBhJvS2naoK2zuDnsrem18FdBzXoyjza/XeBw7+6WhhNd2g6uU2XijpuS2Ss6oEVaMO1IGN8MpdvqO6AptZe2rRKmZ9iHwOfnRqB5Vo3dvucFUL4BfB99KSfyC337mkYfNBxGMYaUm1oQTv3jrro+zSE4I2o1S1txoU5Oon1+x6T/a62avu1v0bRJeg==
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Delivery-date: Wed, 27 Sep 2023 12:49:54 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Julien,

On 27/09/2023 13:01, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 26/09/2023 09:36, Michal Orzel wrote:
>> On 26/09/2023 07:33, Leo Yan wrote:
>>>
>>>
>>> During the Linux kernel booting, an error is reported by the Xen
>>> hypervisor:
>>>
>>>    (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>>
>>> The kernel attempts to use an invalid memory frame number, which can be
>>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>>
>>> The invalid memory frame falls into a reserved memory node, which is
>>> defined in the device tree as below:
>>>
>>>    reserved-memory {
>>>            #address-cells = <0x02>;
>>>            #size-cells = <0x02>;
>>>            ranges;
>>>
>>>            ...
>>>
>>>            ethosn_reserved {
>>>                    compatible = "shared-dma-pool";
>>>                    reg = <0x01 0xa0000000 0x00 0x20000000>;
>>>                    status = "disabled";
>>>                    no-map;
>>>            };
>>>
>>>            ...
>>>    };
>>>
>>> Xen excludes all reserved memory regions from the frame management
>>> through the dt_unreserved_regions() function. On the other hand, the
>>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>>> kernel ignores the 'ethosn_reserved' node since its status is
>>> "disabled". This leads to the Linux kernel to allocate pages from the
>>> reserved memory range.
>>>
>>> As a result, when the kernel passes the data structure residing in the
>>> frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
>>> it misses to manage the frame and reports the error.
>>>
>>> Essentially, this issue is caused by the Xen hypervisor which misses to
>>> handle the status for the memory nodes (for both the normal memory nodes
>>> and the reserved memory nodes) and transfers them to the Linux kernel.
>>>
>>> This patch introduces a function memory_node_is_available(). If it
>>> detects a memory node is not enabled, the hypervisor will not add the
>>> memory region into the memory lists. In the end, this avoids to generate
>>> the memory nodes from the memory lists sent to the kernel and the kernel
>>> cannot use the disabled memory nodes any longer.
>>
>> Interesting. So FWICS, we have 2 issues that have a common ground:
>> 1) If the reserved memory node has a status "disabled", it implies that this 
>> region
>> is no longer reserved and can be used which is not handled today by Xen and 
>> leads
>> to the above mentioned problem.
>>
>> 2) If the memory node has a status "disabled" it implies that it should not 
>> be used
>> which is not the case in current Xen. This means that at the moment, Xen 
>> would try
>> to use such a memory region which is incorrect.
>>
>> I think the commit msg should be more generic and focus on these two issues.
>> Summary:
>> Xen does not handle the status property of memory nodes and ends up using 
>> them.
>> Xen does not handle the status property of reserved memory nodes. If status 
>> is disabled
>> it means the reserved region shall no longer be treated as reserved. Xen 
>> passes the reserved
>> memory nodes untouched to dom0 fdt and creates a memory node to cover 
>> reserved regions.
>> Disabled reserved memory nodes are ignored by the guest which leaves us with 
>> the exposed
>> memory nodes. Guest can access such a region but it is excluded from the 
>> page management in Xen
>> which results in Xen being unable to obtain the corresponding MFN.
>>
>>>
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>>   xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
>>>   1 file changed, 16 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>>> index 2673ad17a1..b46dde06a9 100644
>>> --- a/xen/arch/arm/bootfdt.c
>>> +++ b/xen/arch/arm/bootfdt.c
>>> @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, 
>>> int node,
>>>       return 0;
>>>   }
>>>
>>> +static bool __init memory_node_is_available(const void *fdt, unsigned long 
>>> node)
>> This function is not strictly related to memory node so it would be better 
>> to call it e.g. device_tree_node_is_available.
> 
> +1.
> 
>>> +{
>>> +    const char *status = fdt_getprop(fdt, node, "status", NULL);
>>> +
>>> +    if (!status)
>> white spaces between brackets and condition
>> if ( !status )
>>
>>> +        return true;
>>> +
>>> +    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
>> white spaces between brackets and condition
>> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
>>
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>>   static int __init process_memory_node(const void *fdt, int node,
>>>                                         const char *name, int depth,
>>>                                         u32 address_cells, u32 size_cells,
>>>                                         void *data)
>>>   {
>>> +    if (!memory_node_is_available(fdt, node))
>>> +        return 0;
>> I would move this check to device_tree_get_meminfo()
> 
> I am ok with that. But the commit message would need to gain a paragraph
> explaining that we will now support "status" for static memory/heap.
> 
>>> +
>>>       return device_tree_get_meminfo(fdt, node, "reg", address_cells, 
>>> size_cells,
>>>                                      data, MEMBANK_DEFAULT);
>>>   }
>>> --
>>> 2.39.2
>>>
>>>
>>
>> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in 
>> boot_fdt_info().
>> Otherwise the panic from Xen when there is no memory bank:
>> The frametable cannot cover the physical region ...
>> is not really meaningful for normal users.
>>
>> This is just my suggestion (@Julien ?)
> 
> I think a check for the number of banks makes sense. But I would prefer
> if the check also happens in production. So, something like:
> 
> if ( !bootinfo.mem.nr_banks )
>    panic(...);
> 
> We already have one in the setup_mm() for arm32. So we need another one
> for the arm64 version. The other solution is to consolidate it in one
> place you suggested.
> 
> I have a slightly preference for having it in setup_mm() even if this is
> duplicated.
Either way is fine. The advantage of placing the check in boot_fdt_info() is
that we can have a single check instead of duplicated and we do the check before
the "first" use which happens to be the banks sorting. The advantage of 
setup_mm()
is that it is the one dealing with memory banks and is called after 
early_print_info()
so user can see some additional info.

@Leo, will you send a patch for that? Don't feel obliged as it is not strictly 
related
to your patch in which case I can handle it.

~Michal



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.