[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Minios-devel] [PATCH 29/40] arm64: init the memory system before console/xenbus/gic



Hi Shijie,

On 03/11/17 03:12, Huang Shijie wrote:
The mappings for console/xenbus/gic need several pages to setup
the page tables.

So we init the memory system before initiating the console/xenbus/gic,
and then we can allocate the pages which are needed by
the console/xenbus/gic mapping.

we use the 2M memory block for the memory.

Why 2MB and not 1GB mapping when possible? Also, what if the guest is using less than 2MB of memory?


Change-Id: I0d075478c107cf680d1a20989d57c0fcd727ab29
Jira: ENTOS-247
Signed-off-by: Huang Shijie <shijie.huang@xxxxxxx>
---
  arch/arm/mm.c         | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++
  arch/arm/setup.c      |  5 +++
  include/arm/arch_mm.h |  1 +
  mm.c                  |  5 ++-
  4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm.c b/arch/arm/mm.c
index edb734f..fb7886c 100644
--- a/arch/arm/mm.c
+++ b/arch/arm/mm.c
@@ -24,6 +24,102 @@ unsigned long allocate_ondemand(unsigned long n, unsigned 
long alignment)
      BUG();
  }
+#if defined(__aarch64__)
+
+#include <arm64/pagetable.h>
+
+extern lpae_t boot_l1_pgtable[512];
+
+static inline void set_pgt_entry(lpae_t *ptr, lpae_t val)
+{
+    *ptr = val;
+    dsb(ishst);
+    isb();
+}
+
+static void build_pud(lpae_t *pgd, unsigned long vaddr, unsigned long vend,

Is vend included or excluded from the range mapped?

+                      paddr_t phys, long mem_type,

Why the mem_type is long?

+                      paddr_t (*new_page)(void), int level)
+{
+    lpae_t *pud;
+
+    pud = (lpae_t *)to_virt((*pgd) & ~ATTR_MASK_L) + l2_pgt_idx(vaddr);
+    do {
+        if (level == 2)
+             set_pgt_entry(pud, (phys & L2_MASK) | mem_type | L2_BLOCK);
+        vaddr += L2_SIZE;
+        phys += L2_SIZE;
+        pud++;

I am not sure to understand why you handle the third level of page-table in a separate patch (see #30). This is making more difficult to find whether something is missing.

However, I am not fully convinced the way you implement the page-tables support is the right way to go. Indeed you limit yourself in using the same level for all the range of the mapping. This is preventing a range of optimization such as using 1GB mapping whenever is possible and switch to 2MB when it is not.

Furthermore, the LPAE format has been nicely design. There are no need to implement each level in a separate helper. A loop would be sufficient. I can live with that implementation thought.

Lastly, this code only works when adding entry. We will need to remove entries. See unmap_frames used by munmap.

+    } while (vaddr < vend);
+}
+
+/* Setup the page table when the MMU is enabled */

I am not sure to understand this comment.

+void build_pagetable(unsigned long vaddr, unsigned long start_pfn,

Why is that exposed outside mm.c?

+                     unsigned long max_pfn, long mem_type,
+                     paddr_t (*new_page)(void), int level)
+{
+    paddr_t p_start;
+    unsigned long v_end, next;
+    lpae_t *pgd;
+
+    v_end = vaddr + max_pfn * PAGE_SIZE;
+    p_start = PFN_PHYS(start_pfn);
+
+    /* The boot_l1_pgtable can span 512G address space */
+    pgd = &boot_l1_pgtable[l1_pgt_idx(vaddr)];
+
+    do {
+        next = (vaddr + L1_SIZE);
+        if (next > v_end)
+            next = v_end;
+
+        if ((*pgd) == L1_INVAL) {
+            set_pgt_entry(pgd, (new_page()) | PT_PT);

What if new_page() fails?

+        }
+
+        build_pud(pgd, vaddr, next, p_start, mem_type, new_page, level);
+        p_start += next - vaddr;
+        vaddr = next;
+        pgd++;
+    } while (vaddr != v_end);
+}

The code you implemented for page-table looks far too simple. For instance, I am surprised there are no TLB maintenance anywhere.

+
+static unsigned long first_free_pfn;

Likely this variable and get_new_page needs more explanation in the code.

+static paddr_t get_new_page(void)

I think it would be better if you had early (or something similar) in the name. To make clear that is only used for early boot.

+{
+    paddr_t new_page;
+
+    memset(pfn_to_virt(first_free_pfn), 0, PAGE_SIZE);

This is quite confusing, get_new_page() allocates page-table that
will be used to map the RAM. So how do you bootstrap it?

+    dsb(ishst);
+
+    new_page = PFN_PHYS(first_free_pfn);
+    first_free_pfn++;

What if you exhaust in memory? AFAICT, the user will see a failure like a data abort. This will not be very obvious to track.

+    return new_page;
+}
+
+/*
+ * This function will setup the page table for the memory system.
+ *
+ * Note: We get the page for page table from the first free PFN,

s/get/allocate/

+ *       the get_new_page() will increase the @first_free_pfn.

But this comment would have been better on top of get_new_page/first_free_pfn.

+ */
+void init_pagetable(unsigned long *start_pfn, unsigned long base_pfn,
+                    unsigned long max_pfn)
+{
+    first_free_pfn = *start_pfn;
+
+    build_pagetable((unsigned long)pfn_to_virt(base_pfn), base_pfn,
+                   max_pfn, BLOCK_DEF_ATTR, get_new_page, 2);
+    *start_pfn = first_free_pfn;
+}
+
+#else
+void init_pagetable(unsigned long *start_pfn, unsigned long base_pfn,
+                    unsigned long max_pfn)
+{
+}
+#endif
+
  void arch_init_mm(unsigned long *start_pfn_p, unsigned long *max_pfn_p)
  {
      int memory;
@@ -74,6 +170,8 @@ void arch_init_mm(unsigned long *start_pfn_p, unsigned long 
*max_pfn_p)
      heap_len = mem_size - (PFN_PHYS(*start_pfn_p) - mem_base);
      *max_pfn_p = *start_pfn_p + PFN_DOWN(heap_len);
+ init_pagetable(start_pfn_p, PHYS_PFN(mem_base), PHYS_PFN(mem_size));
+
      printk("Using pages %lu to %lu as free space for heap.\n", *start_pfn_p, 
*max_pfn_p);
/* The device tree is probably in memory that we're about to hand over to the page
diff --git a/arch/arm/setup.c b/arch/arm/setup.c
index bde30c6..61daca3 100644
--- a/arch/arm/setup.c
+++ b/arch/arm/setup.c
@@ -41,6 +41,11 @@ void arch_init(void *dtb_pointer, paddr_t physical_offset)
      /* Map shared_info page */
      HYPERVISOR_shared_info = map_shared_info(NULL);
+#if defined(__aarch64__)

Carve out from arch_init_mm what is necessary but don't call init_mm() early just for aarch64. See what x86 did with arch_mm_preinit().

+    /* We need to init the memory, and then init the console/xenbus/gic */
+    init_mm();
+#endif
+
      get_console(NULL);
      get_xenbus(NULL);
diff --git a/include/arm/arch_mm.h b/include/arm/arch_mm.h
index 6dfa61e..235e914 100644
--- a/include/arm/arch_mm.h
+++ b/include/arm/arch_mm.h
@@ -1,6 +1,7 @@
  #ifndef _ARCH_MM_H_
  #define _ARCH_MM_H_
+typedef uint64_t lpae_t;
  typedef uint64_t paddr_t;
extern char _text, _etext, _erodata, _edata, _end, __bss_start;
diff --git a/mm.c b/mm.c
index 74565d1..b090a0a 100644
--- a/mm.c
+++ b/mm.c
@@ -384,12 +384,14 @@ void *sbrk(ptrdiff_t increment)
  #endif
-
+static int mm_inited;
  void init_mm(void)
  {
unsigned long start_pfn, max_pfn; + if (mm_inited)

See my comment above.

+        return;
      printk("MM: Init\n");
get_max_pages();
@@ -407,6 +409,7 @@ void init_mm(void)
  #ifdef CONFIG_BALLOON
      nr_mem_pages = max_pfn;
  #endif
+    mm_inited = 1;
  }
void fini_mm(void)


Cheers,

--
Julien Grall

_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/minios-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.