[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END



Hi David,

On 01/02/2020 00:32, David Woodhouse wrote:
From: David Woodhouse <dwmw@xxxxxxxxxxxx>

Bad pages are identified by get_platform_badpages() and with badpage=
on the command line.

The boot allocator currently automatically elides these from the regions
passed to it with init_boot_pages(). The xenheap is then initialised
with the pages which are still marked as free by the boot allocator when
end_boot_allocator() is called.

However, any memory above HYPERVISOR_VIRT_END is passed directly to
init_domheap_pages() later in __start_xen(), and the bad page list is
not consulted.

Fix this by marking those pages as PGC_broken in the frametable at the
time end_boot_allocator() runs, and then making init_heap_pages() skip
over any pages which are so marked.

Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
---
  xen/common/page_alloc.c | 82 +++++++++++++++++++++++++++++++++++++++--
  1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 919a270587..3cf478311b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1758,6 +1758,18 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
      return 0;
  }
+static unsigned long contig_avail_pages(struct page_info *pg, unsigned long max_pages)
+{
+    unsigned long i;
+
+    for ( i = 0 ; i < max_pages; i++)
+    {
+        if ( pg[i].count_info & PGC_broken )
+            break;
+    }
+    return i;
+}
+
  /*
   * Hand the specified arbitrary page range to the specified heap zone
   * checking the node_id of the previous page.  If they differ and the
@@ -1799,18 +1811,23 @@ static void init_heap_pages(
      {
          unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
+ /* If the (first) page is already marked broken, don't add it. */
+        if ( pg[i].count_info & PGC_broken )
+            continue;
+
          if ( unlikely(!avail[nid]) )
          {
+            unsigned long contig_nr_pages = contig_avail_pages(pg + i, 
nr_pages);
              unsigned long s = mfn_x(page_to_mfn(pg + i));
-            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 
1));
+            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + i + 
contig_nr_pages - 1), 1));
              bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
                              !(s & ((1UL << MAX_ORDER) - 1)) &&
                              (find_first_set_bit(e) <= find_first_set_bit(s));
              unsigned long n;
- n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
+            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), 
contig_nr_pages,
                                 &use_tail);
-            BUG_ON(i + n > nr_pages);
+            BUG_ON(n > contig_nr_pages);
              if ( n && !use_tail )
              {
                  i += n - 1;
@@ -1846,6 +1863,63 @@ static unsigned long avail_heap_pages(
      return free_pages;
  }
+static void mark_bad_pages(void)
+{
+    unsigned long bad_spfn, bad_epfn;

I would like to avoid adding more 'pfn' in the code. Per the current terminology (see include/xen/mm.h), this would be bad_smfn, bad_emfn.

Ideally, this should also use mfn_t, but I can live without it for now.

+    const char *p;
+    struct page_info *pg;
+#ifdef CONFIG_X86
+    const struct platform_bad_page *badpage;
+    unsigned int i, j, array_size;
+
+    badpage = get_platform_badpages(&array_size);
+    if ( badpage )
+    {
+        for ( i = 0; i < array_size; i++ )
+        {
+            for ( j = 0; j < 1UL << badpage->order; j++ )
+            {
+                if ( mfn_valid(_mfn(badpage->mfn + j)) )
+                {
+                    pg = mfn_to_page(_mfn(badpage->mfn + j));
+                    pg->count_info |= PGC_broken;
+                    page_list_add_tail(pg, &page_broken_list);
+                }
+            }
+        }
+    }

You are missing the PV shim part here.

But, AFAICT, the only difference with the implementation in init_boot_pages() is how we treat the page.

Could you we introduce a common helper taking a callback? The callback would be applied on every range range.

+#endif
+
+    /* Check new pages against the bad-page list. */
+    p = opt_badpage;
+    while ( *p != '\0' )
+    {
+        bad_spfn = simple_strtoul(p, &p, 0);
+        bad_epfn = bad_spfn;
+
+        if ( *p == '-' )
+        {
+            p++;
+            bad_epfn = simple_strtoul(p, &p, 0);
+            if ( bad_epfn < bad_spfn )
+                bad_epfn = bad_spfn;
+        }
+
+        if ( *p == ',' )
+            p++;
+        else if ( *p != '\0' )
+            break;
+
+        while ( mfn_valid(_mfn(bad_spfn)) && bad_spfn < bad_epfn )
+        {
+            pg = mfn_to_page(_mfn(bad_spfn));
+            pg->count_info |= PGC_broken;
+            page_list_add_tail(pg, &page_broken_list);
+            bad_spfn++;
+        }
+    }
+}
+
  void __init end_boot_allocator(void)
  {
      unsigned int i;
@@ -1870,6 +1944,8 @@ void __init end_boot_allocator(void)
      }
      nr_bootmem_regions = 0;
+ mark_bad_pages();
+
      if ( !dma_bitsize && (num_online_nodes() > 1) )
          dma_bitsize = arch_get_dma_bitsize();

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.