[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0

To: Julien Grall <julien@xxxxxxx>
From: Oleksandr <olekstysh@xxxxxxxxx>
Date: Fri, 17 Sep 2021 22:51:20 +0300
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Wei Chen <Wei.Chen@xxxxxxx>
Delivery-date: Fri, 17 Sep 2021 19:51:41 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 17.09.21 18:48, Julien Grall wrote:

Hi Oleksandr,


Hi Julien


On 10/09/2021 23:18, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain.
- The ACPI case is not covered.

I understand the ACPI is not covered because we would need to create anew binding. But I am not sure to understand why 32-bit domain is notsupported. Can you explain it?

The 32-bit domain is not supported for simplifying things from thebeginning. It is a little bit difficult to get everything working atstart. As I understand from discussion at [1] we can afford thatsimplification. However, I should have mentioned that 32-bit domain isnot supported "for now".

***

As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
the algorithm to choose extended regions for it is different
in comparison with the algorithm for non-direct mapped DomU.
What is more, that extended regions should be chosen differently
whether IOMMU is enabled or not.

Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
holes found in host device-tree if otherwise.
For the case when the IOMMU is disabled, this will only work if dom0cannot allocate memory outside of the original range. This iscurrently the case... but I think this should be spelled out in atleast the commit message.


Agree, will update commit description.

Make sure that
extended regions are 2MB-aligned and located within maximum possible
addressable physical memory range. The maximum number of extended
regions is 128.


Please explain how this limit was chosen.

Well, I decided to not introduce new data struct and etc to representextended regions but reuse existing struct meminfoused for memory/reserved-memory and, as I though, perfectly fitted. So,that limit come from NR_MEM_BANKS which is 128.

Suggested-by: Julien Grall <jgrall@xxxxxxxxxx>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
---
Changes since RFC:
    - update patch description
    - drop uneeded "extended-region" DT property
---
xen/arch/arm/domain_build.c | 226+++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 206038d..070ec27 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -724,6 +724,196 @@ static int __init make_memory_node(const structdomain *d,
      return res;
  }
+static int __init add_ext_regions(unsigned long s, unsigned longe, void *data)
+{
+    struct meminfo *ext_regions = data;
+    paddr_t start, size;
+
+    if ( ext_regions->nr_banks >= ARRAY_SIZE(ext_regions->bank) )
+        return 0;
+
+ /* Both start and size of the extended region should be 2MBaligned */
+    start = (s + SZ_2M - 1) & ~(SZ_2M - 1);
+    if ( start > e )
+        return 0;
+
+    size = (e - start + 1) & ~(SZ_2M - 1);
+    if ( !size )
+        return 0;
+
+    ext_regions->bank[ext_regions->nr_banks].start = start;
+    ext_regions->bank[ext_regions->nr_banks].size = size;
+    ext_regions->nr_banks ++;
+
+    return 0;
+}
+
+/*
+ * The extended regions will be prevalidated by the memory hotplug path
+ * in Linux which requires for any added address range to be withinmaximum+ * possible addressable physical memory range for which the linearmapping
+ * could be created.
+ * For 48-bit VA space size the maximum addressable range are:
When I read "maximum", I understand an upper limit. But below, you areproviding a range. So should you drop "maximum"?


yes, it is a little bit confusing.



Also, this is tailored to Linux using 48-bit VA. How about other limits?

These limits are calculated at [2]. Sorry, I didn't investigate yet whatvalues would be for other CONFIG_ARM64_VA_BITS_XXX. Also looks like someconfigs depend on 16K/64K pages...

I will try to investigate and provide limits later on.

+ * 0x40000000 - 0x80003fffffff
+ */
+#define EXT_REGION_START   0x40000000ULL
I am probably missing something here.... There are platform out therewith memory starting at 0 (IIRC ZynqMP is one example). So wouldn'tthis potentially rule out the extended region on such platform?

From my understanding the extended region cannot be in 0...0x40000000range. If these platforms have memory above first GB, I believe theextended region(s) can be allocated for them.

+#define EXT_REGION_END 0x80003fffffffULL
+
+static int __init find_unallocated_memory(const struct kernel_info*kinfo,
+                                          struct meminfo *ext_regions)
+{
+    const struct meminfo *assign_mem = &kinfo->mem;
+    struct rangeset *unalloc_mem;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
We technically already know which range of memory is unused. This ispretty much any region in the freelist of the page allocator. So howabout walking the freelist instead?

ok, I will investigate the page allocator code (right now I have nounderstanding of how to do that). BTW, I have just grepped "freelist"through the code and all page context related appearances are in x86code only.

The advantage is we don't need to worry about modifying the functionwhen adding new memory type.

One disavantage is this will not cover *all* the unused memory as thisis doing. But I think this is an acceptable downside.

+
+    dt_dprintk("Find unallocated memory for extended regions\n");
+
+    unalloc_mem = rangeset_new(NULL, NULL, 0);
+    if ( !unalloc_mem )
+        return -ENOMEM;
+
+    /* Start with all available RAM */
+    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
+    {
+        start = bootinfo.mem.bank[i].start;

+ end = bootinfo.mem.bank[i].start + bootinfo.mem.bank[i].size- 1;

+        res = rangeset_add_range(unalloc_mem, start, end);
+        if ( res )
+        {

+ printk(XENLOG_ERR "Failed to add:%#"PRIx64"->%#"PRIx64"\n",

+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove RAM assigned to Dom0 */
+    for ( i = 0; i < assign_mem->nr_banks; i++ )
+    {
+        start = assign_mem->bank[i].start;
+        end = assign_mem->bank[i].start + assign_mem->bank[i].size - 1;
+        res = rangeset_remove_range(unalloc_mem, start, end);
+        if ( res )
+        {

+ printk(XENLOG_ERR "Failed to remove:%#"PRIx64"->%#"PRIx64"\n",

+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove reserved-memory regions */
+    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        start = bootinfo.reserved_mem.bank[i].start;
+        end = bootinfo.reserved_mem.bank[i].start +
+            bootinfo.reserved_mem.bank[i].size - 1;
+        res = rangeset_remove_range(unalloc_mem, start, end);
+        if ( res )
+        {

+ printk(XENLOG_ERR "Failed to remove:%#"PRIx64"->%#"PRIx64"\n",

+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove grant table region */
+    start = kinfo->gnttab_start;
+    end = kinfo->gnttab_start + kinfo->gnttab_size - 1;
+    res = rangeset_remove_range(unalloc_mem, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    start = EXT_REGION_START;
+    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
+    res = rangeset_report_ranges(unalloc_mem, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(unalloc_mem);
+
+    return res;
+}
+
+static int __init find_memory_holes(const struct kernel_info *kinfo,
+                                    struct meminfo *ext_regions)
+{
+    struct dt_device_node *np;
+    struct rangeset *mem_holes;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find memory holes for extended regions\n");
+
+    mem_holes = rangeset_new(NULL, NULL, 0);
+    if ( !mem_holes )
+        return -ENOMEM;
+
+    /* Start with maximum possible addressable physical memory range */
+    start = EXT_REGION_START;
+    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
+    res = rangeset_add_range(mem_holes, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to add: %#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+

+ /* Remove all regions described by "reg" property (MMIO, RAM,etc) */

Well... The loop below is not going to handle all the regionsdescribed in the property "reg". Instead, it will cover a subset of"reg" where the memory is addressable.

As I understand, we are only interested in subset of "reg" where thememory is addressable.

You will also need to cover "ranges" that will describe the BARs forthe PCI devices.

Good point. Could you please clarify how to recognize whether it is aPCI device as long as PCI support is not merged? Or just to find anydevice nodes with non-empty "ranges" property

and retrieve addresses?

+    dt_for_each_device_node( dt_host, np )
+    {
+        unsigned int naddr;
+        u64 addr, size;
+
+        naddr = dt_number_of_address(np);
+
+        for ( i = 0; i < naddr; i++ )
+        {
+            res = dt_device_get_address(np, i, &addr, &size);
+            if ( res )
+            {

+ printk(XENLOG_ERR "Unable to retrieve address %u for%s\n",

+                       i, dt_node_full_name(np));
+                goto out;
+            }
+
+            start = addr & PAGE_MASK;
+            end = PAGE_ALIGN(addr + size) - 1;
+            res = rangeset_remove_range(mem_holes, start, end);
+            if ( res )
+            {

+ printk(XENLOG_ERR "Failed to remove:%#"PRIx64"->%#"PRIx64"\n",

+                       start, end);
+                goto out;
+            }
+        }
+    }
+
+    start = EXT_REGION_START;
+    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
+    res = rangeset_report_ranges(mem_holes, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(mem_holes);
+
+    return res;
+}
+
  static int __init make_hypervisor_node(struct domain *d,

const struct kernel_info*kinfo,

                                         int addrcells, int sizecells)

@@ -731,11 +921,13 @@ static int __init make_hypervisor_node(structdomain *d,

      const char compat[] =
"xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
          "xen,xen";
-    __be32 reg[4];
+    __be32 reg[(NR_MEM_BANKS + 1) * 4];

This is a fairly large allocation on the stack. Could we move to adynamic allocation?


Of course, will do.

      gic_interrupt_t intr;
      __be32 *cells;
      int res;
      void *fdt = kinfo->fdt;
+    struct meminfo *ext_regions;
+    unsigned int i;
        dt_dprintk("Create hypervisor node\n");

@@ -757,12 +949,42 @@ static int __init make_hypervisor_node(structdomain *d,

      if ( res )
          return res;
  +    ext_regions = xzalloc(struct meminfo);
+    if ( !ext_regions )
+        return -ENOMEM;
+
+    if ( is_32bit_domain(d) )

+ printk(XENLOG_WARNING "The extended region is only supportedfor 64-bit guest\n");

+    else
+    {
+        if ( !is_iommu_enabled(d) )
+            res = find_unallocated_memory(kinfo, ext_regions);
+        else
+            res = find_memory_holes(kinfo, ext_regions);
+
+        if ( res )

+ printk(XENLOG_WARNING "Failed to allocate extendedregions\n");

+    }
+
      /* reg 0 is grant table space */
      cells = &reg[0];
      dt_child_set_range(&cells, addrcells, sizecells,
                         kinfo->gnttab_start, kinfo->gnttab_size);
+    /* reg 1...N are extended regions */
+    for ( i = 0; i < ext_regions->nr_banks; i++ )
+    {
+        u64 start = ext_regions->bank[i].start;
+        u64 size = ext_regions->bank[i].size;
+
+        dt_dprintk("Extended region %d: %#"PRIx64"->%#"PRIx64"\n",
+                   i, start, start + size);
+
+        dt_child_set_range(&cells, addrcells, sizecells, start, size);
+    }
+    xfree(ext_regions);
+
      res = fdt_property(fdt, "reg", reg,
-                       dt_cells_to_size(addrcells + sizecells));

+ dt_cells_to_size(addrcells + sizecells) * (i+ 1));

      if ( res )
          return res;


Cheers,

[1]https://lore.kernel.org/xen-devel/cb1c8fd4-a4c5-c18e-c8db-f8e317d95526@xxxxxxx/

[2]https://elixir.bootlin.com/linux/v5.15-rc1/source/arch/arm64/mm/mmu.c#L1448



Thank you.


--
Regards,

Oleksandr Tyshchenko

Follow-Ups:
- Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
  - From: Julien Grall
- Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
  - From: Oleksandr
- Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
  - From: Stefano Stabellini

References:
- [PATCH V2 0/3] Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space")
  - From: Oleksandr Tyshchenko
- [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
  - From: Oleksandr Tyshchenko
- Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
  - From: Julien Grall

Prev by Date: [linux-5.4 test] 165021: tolerable FAIL - PUSHED
Next by Date: Re: [PATCH v2 1/4] swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
Previous by thread: Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
Next by thread: Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.