[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: arm64: Handling reserved memory nodes





On 20/09/2023 11:03, Leo Yan wrote:
On Mon, Sep 18, 2023 at 08:26:21PM +0100, Julien Grall wrote:

[...]

... from my understanding reserved-memory are just normal memory that are
set aside for a specific purpose. So Xen has to create a 'memory' node *and*
a 'reserved-memory' region.

To be clear, Xen passes the 'reserved-memory' regions as normal memory
nodes, see [1].

The memory nodes need to be explicitely written because they are excluded in handle_node(). If a node is not excluded, then it should be created in the dom0 Device-Tree.

AFAICT, the 'reserved-memory' node is not excluded and therefore should be copied to the dom0 DT.

[...]

Here the problem is these reserved memory regions are passed as normal
memory nodes to Dom0 kernel, then Dom0 kernel allocates pages from
these reserved memory regions.  Apparently, this might lead to conflict,
e.g. the reserved memory is used by Dom0 kernel, at the meantime the
memory is used by another purpose (e.g. by MCU in the system).

See above. I think this is correct to pass both 'memory' and
'reserved-memory'. Now, it is possible that Xen may not create the
device-tree correctly.

Agreed that now Xen wrongly create DT binding for 'reserved-memory'
node, more specific, the reserved memory nodes are wrongly passed as
normal memory nodes (again, see [1]).

See above. You could dump the dom0 Device-Tree to confirm that 'reserved-memory' is created.


I would suggest to look how Linux is populating the memory and whether it
actually skipped the regions.

The Linux kernel reserves the corresponding pages for all reserved
memory regions, which means the kernel page management (buddy
alrogithm) doesn't allocate these pages at all.

With 'no-map' property, the memory range will not be mapped into the
kernel identical address space.

Here I am a bit confused for "Xen doesn't have the capability to know
the memory attribute".  I looked into the file arch/arm/guest_walk.c,
IIUC, it walks through the stage 1's page tables for the virtual
machine and get the permission for the mapping, we also can get to
know the mapping attribute, right?

Most of the time, Xen will use the HW to translate the guest virtual address
to an intermediation physical address. Looking at the specification, it
looks like that PAR_EL1 will contain the memory attribute which I didn't
know.

We would then need to read MAIR_EL1 to find the attribute and also the
memory attribute in the stage-2 to figure out the final memory attribute.

This is feasible but the Xen ABI mandates that region passed to Xen have a
specific memory attributes (see the comment at the top of
xen/include/public/arch-arm.h).

If you refer to the comment "All memory which is shared with other
entities in the system ... which is mapped as Normal Inner Write-Back
Outer Write-Back Inner-Shareable", I don't think it's relevant with
current issue.  I will explain in details in below.

It is relevant if you intend to allocate hypercall buffer in a non-reusable reserved-region.


Anyway, in your case, Linux is using the buffer is on the stack. So the
region must have been mapped with the proper attribute.

I think you may misunderstand the issue.  I would like to divide the
issue into two parts:

- The first question is about how to pass reserved memory node from Xen
   hypervisor to Dom0 Linux kernel.  Currently, Xen hypervisor coverts
   the reserved memory ranges and add them into the normal memory node.

   Xen hypervisor should keep the reserved memory node and pass it to
   Dom0 Linux kernel.  With this change, the Dom0 kernel will only
   allocate pages from normal memory node and the data in these pages
   can be shared by Xen hypervisor and Dom0 Linux kernel.

This should be the case. See above.

[...]

Another question for the attribute for MMIO regions. For mapping MMIO
regions, prepare_dtb_hwdom() sets the attribute 'p2m_mmio_direct_c'
for the stage 2, but in the Linux kernel the MMIO's attribute can
be one of below variants:

- ioremap(): device type with nGnRE;
- ioremap_np(): device type with nGnRnE (strong-ordered);
- ioremap_wc(): normal non-cachable.

The stage-2 memory attribute is used to restrict the final memory attribute.
In this case, p2m_mmio_direct_c allows the domain to set pretty much any
memory attribute.

Thanks for confirmation.  If so, I think the Xen hypervisor should
follow the same attribute to map the reserved regions with attribute
p2m_mmio_direct_c.

If Xen hypervisor can handle these MMIO types in stage 2, then we should
can use the same way to map stage 2 tables for the reserved memory.  A
difference for the reserved memory is it can be mapped as normal memory
with cacheable.

I am a bit confused. I read this as you think the region is not mapped in
the P2M (aka stage-2 page-tables for Arm). But from the logs you provided,
the regions are already mapped (you have an MFN in hand).

You are right.  The reserved memory regions have been mapped in P2M.

So to me the error is most likely in how we create the Device-Tree.

Yeah, let's firstly focus on the DT binding for reserved memory nodes.

The DT binding is something like (I tweaked a bit for readable):

Just to confirm this is the host device tree, right? If so...

Yes.

        memory@20000000 {
                #address-cells = <0x02>;
                #size-cells = <0x02>;
                device_type = "memory";
                reg = <0x00 0x20000000 0x00 0xa0000000>,
                        <0x01 0xa0000000 0x01 0x60000000>;
        };

... you can see the reserved-regions are described in the normal memory. In
fact...



        reserved-memory {
                #address-cells = <0x02>;
                #size-cells = <0x02>;
                ranges;

                reserved_mem1 {
                        reg = <0x00 0x20000000 0x00 0x00010000>;
                        no-map;
                };

                reserved_mem2 {
                        reg = <0x00 0x40000000 0x00 0x20000000>;
                        no-map;
                };

                reserved_mem3 {
                        reg = <0x01 0xa0000000 0x00 0x20000000>;
                        no-map;
                };

... no-map should tell the kernel to not use the memory at all. So I am a
bit puzzled why it is trying to use it.

No, 'no-map' doesn't mean the Linux kernel doesn't use it, I quote from
the kernel documentation
I am under the impression that we have a different meaning for 'using' here. I am referring to the fact that when 'no-map' is specificed, then the kernel cannot use the region for other purpose (e.g. stack).

So the fact that the stack seemsm to resides in a reserved-region implies that Linux didn't detect the 'no-map'.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.