[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: arm64: Handling reserved memory nodes


  • To: Leo Yan <leo.yan@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>
  • From: Michal Orzel <michal.orzel@xxxxxxx>
  • Date: Wed, 20 Sep 2023 12:31:38 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linaro.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rQ3UqlTwNaGsFpptNp2ZFomxaCbBAtVt0GvsqRflZHo=; b=ekuQ5TBc1q58aTgdN+x00IIcbjMknsV93jDyu/U77ZAa9OlM6dB3UkSF3SjzA8Is+SIJDVWR6+hTsQSzcO2u2O/2b8b0IBBMTna8ev1gprjkcMk/uCZGUP4Lt58JbxaTEpANdGUdcAPKavQKw6Q7Q6M8EJEvp4R/K4d1XFb61otsbwm1euZd3s+P65p0Y3OGIYc0Meo+0ysAloKBQlow+LYL8CwIT/d2rEDuwi7t/B0PngR1TLqZ4B+QC5W3aUvNEyTHtDfob69/xpKj9OZlFe6NlIm61uWvCvHsugK4vpqsHGmXpXE6Ah2hR9F+uLpt+k0R2+BstFU3TSnrLxIa7g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VCzBnFibti7smYJiVyp1K1vepNQU/NrycRpPgBLIuoPzT7JncUTighLtcgqY8QD7q/t81x5ZGlAPEHxN8b9IvDJmp6F/C3uip+RSb6F7Pk5MGDmS/2NU6Kj9AfFWLO2QQzFyqXZDpRsUcJJBVCTSH8CX42sFWj+0bnvkWhCEs9tgOIEf7nXyB8kG3uyIVIMDKkETcoG2lIAzP17CiXcxe2NgsVuTbQ3NURHSAjYTpuJ75UEXAKaVrSgSH0Wta9VLwvUtnNoaZRY2QUdmqFiZyipgTid6DSuSXBvcM99WjX+fGxNy0ajbA8lU975Nthp66TXzqeScYx6UR22YpBPUxg==
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, "Henry Wang" <Henry.Wang@xxxxxxx>, Penny Zheng <penny.zheng@xxxxxxx>
  • Delivery-date: Wed, 20 Sep 2023 10:32:29 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello,

On 20/09/2023 12:03, Leo Yan wrote:
> 
> 
> Hi Julien,
> 
> On Mon, Sep 18, 2023 at 08:26:21PM +0100, Julien Grall wrote:
> 
> [...]
> 
>> ... from my understanding reserved-memory are just normal memory that are
>> set aside for a specific purpose. So Xen has to create a 'memory' node *and*
>> a 'reserved-memory' region.
> 
> To be clear, Xen passes the 'reserved-memory' regions as normal memory
> nodes, see [1].
> 
>> With that the kernel is supposed to exclude all the 'reserved-memory' from
>> normal usage unless they have the node contains the property 'reusable'.
>> This was more clear before the binding was converted to YAML in [1].
> 
> Linux kernel reserves pages for memory ranges in the 'reserved-memory'
> node, no matter the 'no-map' property for a range is set or not (see the
> function memmap_init_reserved_pages() -> __SetPageReserved() in Linux
> kernel).
> 
> If a reserved memory range is set with 'no-map' property, the memory
> region will be not mapped in the kernel's identical address space.  This
> avoids the data corruption caused between the memory speculative fetch
> with cachable mapping and the same memory region is used by devices
> (e.g. for DMA transferring).
> 
> [...]
> 
>>> Here the problem is these reserved memory regions are passed as normal
>>> memory nodes to Dom0 kernel, then Dom0 kernel allocates pages from
>>> these reserved memory regions.  Apparently, this might lead to conflict,
>>> e.g. the reserved memory is used by Dom0 kernel, at the meantime the
>>> memory is used by another purpose (e.g. by MCU in the system).
>>
>> See above. I think this is correct to pass both 'memory' and
>> 'reserved-memory'. Now, it is possible that Xen may not create the
>> device-tree correctly.
> 
> Agreed that now Xen wrongly create DT binding for 'reserved-memory'
> node, more specific, the reserved memory nodes are wrongly passed as
> normal memory nodes (again, see [1]).
> 
>> I would suggest to look how Linux is populating the memory and whether it
>> actually skipped the regions.
> 
> The Linux kernel reserves the corresponding pages for all reserved
> memory regions, which means the kernel page management (buddy
> alrogithm) doesn't allocate these pages at all.
> 
> With 'no-map' property, the memory range will not be mapped into the
> kernel identical address space.
> 
>>> Here I am a bit confused for "Xen doesn't have the capability to know
>>> the memory attribute".  I looked into the file arch/arm/guest_walk.c,
>>> IIUC, it walks through the stage 1's page tables for the virtual
>>> machine and get the permission for the mapping, we also can get to
>>> know the mapping attribute, right?
>>
>> Most of the time, Xen will use the HW to translate the guest virtual address
>> to an intermediation physical address. Looking at the specification, it
>> looks like that PAR_EL1 will contain the memory attribute which I didn't
>> know.
>>
>> We would then need to read MAIR_EL1 to find the attribute and also the
>> memory attribute in the stage-2 to figure out the final memory attribute.
> 
>> This is feasible but the Xen ABI mandates that region passed to Xen have a
>> specific memory attributes (see the comment at the top of
>> xen/include/public/arch-arm.h).
> 
> If you refer to the comment "All memory which is shared with other
> entities in the system ... which is mapped as Normal Inner Write-Back
> Outer Write-Back Inner-Shareable", I don't think it's relevant with
> current issue.  I will explain in details in below.
> 
>> Anyway, in your case, Linux is using the buffer is on the stack. So the
>> region must have been mapped with the proper attribute.
> 
> I think you may misunderstand the issue.  I would like to divide the
> issue into two parts:
> 
> - The first question is about how to pass reserved memory node from Xen
>   hypervisor to Dom0 Linux kernel.  Currently, Xen hypervisor coverts
>   the reserved memory ranges and add them into the normal memory node.
> 
>   Xen hypervisor should keep the reserved memory node and pass it to
>   Dom0 Linux kernel.  With this change, the Dom0 kernel will only
>   allocate pages from normal memory node and the data in these pages
>   can be shared by Xen hypervisor and Dom0 Linux kernel.
> 
> - The second question is for memory attribute for the reserved memory
>   node.  Note, the reserved memory ranges are not necessarily _shared_
>   between the Xen hypervisor and Dom0 Linux kernel.  I think in most
>   cases, the reserved memory will be ioremaped by drivers (for stage-1);
>   and the Xen hypervisor should map P2M with the attribute
>   p2m_mmio_direct_c, or we can explore a bit based on different
>   properties, e.g. for 'no-map' memory range, we map P2M with
>   p2m_mmio_direct_c; for 'reusable' memory range, we map with
>   attribute 'p2m_ram_rw'.
> 
> To simplify the discussion, I think we can firstly finalize the fixing
> for the fist question and hold on the second question.  After we fix
> the first one, we can come back to think about the second issue.
> 
>>> Another question for the attribute for MMIO regions. For mapping MMIO
>>> regions, prepare_dtb_hwdom() sets the attribute 'p2m_mmio_direct_c'
>>> for the stage 2, but in the Linux kernel the MMIO's attribute can
>>> be one of below variants:
>>>
>>> - ioremap(): device type with nGnRE;
>>> - ioremap_np(): device type with nGnRnE (strong-ordered);
>>> - ioremap_wc(): normal non-cachable.
>>
>> The stage-2 memory attribute is used to restrict the final memory attribute.
>> In this case, p2m_mmio_direct_c allows the domain to set pretty much any
>> memory attribute.
> 
> Thanks for confirmation.  If so, I think the Xen hypervisor should
> follow the same attribute to map the reserved regions with attribute
> p2m_mmio_direct_c.
> 
>>> If Xen hypervisor can handle these MMIO types in stage 2, then we should
>>> can use the same way to map stage 2 tables for the reserved memory.  A
>>> difference for the reserved memory is it can be mapped as normal memory
>>> with cacheable.
>>
>> I am a bit confused. I read this as you think the region is not mapped in
>> the P2M (aka stage-2 page-tables for Arm). But from the logs you provided,
>> the regions are already mapped (you have an MFN in hand).
> 
> You are right.  The reserved memory regions have been mapped in P2M.
> 
>> So to me the error is most likely in how we create the Device-Tree.
> 
> Yeah, let's firstly focus on the DT binding for reserved memory nodes.
> 
>>> The DT binding is something like (I tweaked a bit for readable):
>>
>> Just to confirm this is the host device tree, right? If so...
> 
> Yes.
> 
>>>     memory@20000000 {
>>>             #address-cells = <0x02>;
>>>             #size-cells = <0x02>;
>>>             device_type = "memory";
>>>             reg = <0x00 0x20000000 0x00 0xa0000000>,
>>>                        <0x01 0xa0000000 0x01 0x60000000>;
>>>     };
>>
>> ... you can see the reserved-regions are described in the normal memory. In
>> fact...
>>
>>>
>>>
>>>     reserved-memory {
>>>             #address-cells = <0x02>;
>>>             #size-cells = <0x02>;
>>>             ranges;
>>>
>>>             reserved_mem1 {
>>>                     reg = <0x00 0x20000000 0x00 0x00010000>;
>>>                     no-map;
>>>             };
>>>
>>>             reserved_mem2 {
>>>                     reg = <0x00 0x40000000 0x00 0x20000000>;
>>>                     no-map;
>>>             };
>>>
>>>             reserved_mem3 {
>>>                     reg = <0x01 0xa0000000 0x00 0x20000000>;
>>>                     no-map;
>>>             };
>>
>> ... no-map should tell the kernel to not use the memory at all. So I am a
>> bit puzzled why it is trying to use it.
> 
> No, 'no-map' doesn't mean the Linux kernel doesn't use it, I quote from
> the kernel documentation
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml:
> 'no-map' means the kernel "must not create a virtual mapping of the
> region". The reserved memory regions are still "under the control of the
> device driver using the region".
> 
>> I would suggest to check if somehow Linux doesn't understand the
>> reserved-memory nodes we wrote.
> 
> Could you confirm the Xen does write reserved memory nodes?  Or Xen
> converts the reserved memory nodes to normal memory nodes as I
> describe above :)

Xen passes the /reserved-memory node unchanged from host device tree to dom0 
fdt.
Apart from that it creates an additional memory node covering the reserved 
ranges.
Take a look at this example run(based on qemu):

Host dt:
memory@40000000 {
    reg = <0x00 0x40000000 0x01 0x00>;
    device_type = "memory";
};

reserved-memory {
    #size-cells = <0x02>;
    #address-cells = <0x02>;
    ranges;

    test@50000000 {
        reg = <0x00 0x50000000 0x00 0x10000000>;
        no-map;
    };
};

Xen:
(XEN) MODULE[0]: 000000004ac00000 - 000000004ad65000 Xen
(XEN) MODULE[1]: 000000004ae00000 - 000000004ae03000 Device Tree
(XEN) MODULE[2]: 0000000042c00000 - 000000004aa8ea8b Ramdisk
(XEN) MODULE[3]: 0000000040400000 - 0000000042b30000 Kernel
(XEN)  RESVD[0]: 0000000050000000 - 000000005fffffff
...
(XEN) BANK[0] 0x000000c0000000-0x00000100000000 (1024MB)

Linux dom0:
[    0.000000] OF: reserved mem: 0x0000000050000000..0x000000005fffffff (262144 
KiB) nomap non-reusable test@50000000

cat /proc/iomem:
50000000-5fffffff : reserved
c0000000-ffffffff : System RAM

dtc from Linux dom0:

memory@c0000000 {
    device_type = "memory";
    reg = <0x00 0xc0000000 0x00 0x40000000>;
};

memory@50000000 {
    device_type = "memory";
    reg = <0x00 0x50000000 0x00 0x10000000>;
};

reserved-memory {
    #address-cells = <0x02>;
    #size-cells = <0x02>;
    ranges;

    test@50000000 {
        reg = <0x00 0x50000000 0x00 0x10000000>;
        no-map;
    };
};


~Michal



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.