|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH 1/2] tools/tests/native: Test for Xen Panic after memory offlining
Add a test case testing for heap corruption bug caused by reserving
offlined pages from partially healthy buddies without alignment check.
For example, due to bad RAM, an MCE could cause pages marked as offline.
When a buddy containing offlined pages is freed, those pages
are moved to dedicated isolated page lists.
reserve_offline_page() lacks alignment checks and may grow adjacent
healthy spans into unaligned buddies that violate the fundamental buddy
invariant: buddies of a given order must be aligned to their size.
Consider a valid order-2 buddy (4 pages) with this layout:
+---------------+-----------------+-----------------+----------------+
| head page | tail page 1 | tail page 2 | tail page 3 |
+---------------+-----------------+-----------------+----------------+
reserve_offline_page() then merges unaligned tail pages:
+---------------+-----------------+-----------------+----------------+
| offlined page | head page with a tail page | single page |
+---------------+-----------------+-----------------+----------------+
This leads to a Xen panic, demonstrated by the test case:
1. When a single page is allocated from this buddy, MFN 7 is allocated:
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+----------------+
| offlined page | head page tail page | allocated page |
| | Unaligned buddies are | |
| | an invariant violation! | |
+---------------+-----------------+-----------------+----------------+
2. When MFN 7 is freed, the predecessor merge in free_heap_pages()
kicks in, merging MFN 7 with its naturally aligned predecessor MFN 6:
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+
| offlined page | head page tail page |
| | Unaligned buddies are |
| | an invariant violation! |
+---------------+-----------------+-----------------+----------------+
| head page tail page |
+-----------------+----------------+
As shown, MFN 6 is double-freed. It is in two buddies:
- As the tail page of the unaligned order-1 buddy starting at MFN 5.
- As the head page of the aligned order-1 buddy starting at MFN 6.
3. The next allocations would allocate MFN 7 again, and MFN 6 as well:
Due to the double-free, after the first allocation, MFN 6 remains on
the free list even though its PGC_status is set to in-use.
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+
| offlined page | head page tail page |
| | Unaligned buddies are |
| | an invariant violation! |
+---------------+-----------------+-----------------+----------------+
| in-use page | in-use page |
+-----------------+----------------+
4. When the next page from this buddy is allocated, get_free_page()
returns the buddy head MFN 5. If the allocation is for order-0,
alloc_heap_pages() splits page 6; otherwise, it keeps the buddy.
Either way, the allocator checks the pages' PGC_status values and
expects them not to be in-use. Because MFN 6 is already in-use,
Xen panics (example panic log):
pg[0] MFN 842adc c=0x4000000000000000 o=0 v=0 t=0
Xen BUG at common/page_alloc.c:1324
I reproduced this while running intensive NUMA claim tests combined
with page offlining. The test case in this series demonstrates the
cascading corruption that leads to the panic without intentionally
having to crash a Xen instance to test for the bug.
Using the test case, the final lines are:
$ make -C tools/tests/native test TARGETS=offline-unaligned|grep -v ' xen/'
| The buddy #5 is not aligned to order-1!
| <0>pg[0] MFN 00006 c=0x8000000000000001 o=1213 v=0 t=0
| xen/common/page_alloc.c:1324: WE INVOKED a XEN BUG in alloc_heap_pages()
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
---
tools/tests/native/offline-unaligned.c | 171 +++++++++++++++++++++++++
1 file changed, 171 insertions(+)
create mode 100644 tools/tests/native/offline-unaligned.c
diff --git a/tools/tests/native/offline-unaligned.c
b/tools/tests/native/offline-unaligned.c
new file mode 100644
index 000000000000..1186b1763bef
--- /dev/null
+++ b/tools/tests/native/offline-unaligned.c
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Test that offlining a predecessor of pages that must not be merged
+ * into an unaligned buddy is handled correctly. Specifically, verify
+ * that the Xen page allocator does not merge such unaligned buddies
+ * back onto the free list, which can produce a chain of events that
+ * leads to a Xen panic after a few allocations and frees.
+ *
+ * This test reproduces the scenario in isolation by offlining a page
+ * with an even MFN that has more than two following tail pages.
+ *
+ * If an unaligned buddy is returned to the free list, a sequence of
+ * allocations and a subsequent free can corrupt the free list state
+ * so that a later allocation triggers BUG() and crashes the instance.
+ * The test checks the free list behavior and, if the bug is present,
+ * confirms the resulting BUG().
+ *
+ * Copyright (C) 2026 Cloud Software Group
+ */
+#include "harness/common.h"
+
+/* test_bss_start must be first in the BSS segment */
+void __aligned(PAGE_SIZE) *test_bss_start;
+
+/* Include xen/mm.h so we can wrap page_list_del() to assert the corruption. */
+#define TEST_WRAP_XEN_INCLUDE_XEN_MM_H
+#include "harness/mm-wrapper.h"
+
+static bool expect_free_list_corruption;
+
+/*
+ * Wrap page_list_del() to not fail the test by virtue of the prepared
+ * free list state but continue the test like a running Xen instance
+ * would in many cases. Assert and expect the corruption, and continue.
+ */
+static inline void wrap_page_list_del(struct page_info *page,
+ struct page_list_head *head)
+{
+ printf("page_list_del: page MFN %lu, order %u\n",
+ mfn_x(page_to_mfn(page)), PFN_ORDER(page));
+
+ if ( expect_free_list_corruption )
+ EXPECT_FAIL_BEGIN();
+ CHECK(page->list.next && page->list.prev, "The free list is corrupt now!");
+ if ( expect_free_list_corruption )
+ EXPECT_FAIL_END(1);
+
+ if ( page->list.next && page->list.prev )
+ page_list_del(page, head);
+}
+#define page_list_del(page, head) wrap_page_list_del(page, head)
+
+/*
+ * Include the main test library that sets up scenarios, asserts
+ * allocator state, and provides the definitions and shims needed
+ * to call the Xen page allocator code in this test program.
+ */
+#define TEST_ENABLE_XC_DOMAIN_C
+#include "harness/native.h"
+
+/* Verify the behavior of buddy merging after offlining a page */
+static void test_unaligned_buddy_merge(int start_mfn)
+{
+ struct page_info *pg = frame_table + start_mfn;
+ uint32_t status = 0;
+
+ /*
+ * Prepare a valid order-2 buddy (4 pages) with this layout:
+ * +---------------+-----------------+-----------------+----------------+
+ * | head page | tail page 1 | tail page 2 | tail page 3 |
+ * +---------------+-----------------+-----------------+----------------+
+ */
+ test_page_list_add_buddy(pg, order2); /* Seed the heap with this buddy */
+
+ /* Act */
+ offline_page(page_to_mfn(pg), 0, &status);
+ CHECK(status & PG_OFFLINE_OFFLINED, "Page should be offlined");
+
+ /*
+ * The correct free list state after offlining the head page of the buddy
+ * is the the healthy pages are merged back onto the free list using a
+ * single page and a size-aligned buddy of the remaining pages:
+ * +---------------+-----------------+-----------------+----------------+
+ * | offlined page | single page | head page with a tail page |
+ * +---------------+-----------------+-----------------+----------------+
+ */
+ EXPECT_FAIL_BEGIN();
+ /*
+ * Due to a bug in reserve_offlined_page(), we get an unaligned buddy:
+ * +---------------+-----------------+-----------------+----------------+
+ * | offlined page | head page with a tail page | single page |
+ * +---------------+-----------------+-----------------+----------------+
+ */
+ CHECK(page_aligned(pg + 1), "The buddy #%lu is not aligned to order-%d",
+ mfn_x(page_to_mfn(pg + 1)), PFN_ORDER(pg + 1));
+ EXPECT_FAIL_END(1);
+
+ /* Allocate and free a page to trigger buddy merging on free. */
+
+ /*
+ * After allocating and freeing MFN 7, we get a double-freed MFN 6 due
+ * to aligned predecessor merging in free_heap_pages():
+ *
+ * MFN 4 MFN 5 MFN 6 MFN 7
+ * +---------------+-----------------+-----------------+
+ * | offlined page | head page tail page |
+ * | | Unaligned buddies are |
+ * | | an invariant violation! |
+ * +---------------+-----------------+-----------------+----------------+
+ * | head page tail page |
+ * +-----------------+----------------+
+ */
+ expect_free_list_corruption = true;
+ free_domheap_pages(alloc_domheap_pages(dom1, order0, 0), order0);
+
+ /*
+ * At this point, the free list is already corrupt. In free_heap_pages(),
+ * the tail of the unaligned buddy was added to the free list a 2nd time
+ * as the page of an overlapping aligned buddy. This is per design of the
+ * algorithm: These pages are free and thus the merging occurs as expected.
+ *
+ * The next allocation allocates the tail of the unaligned buddy, which
+ * is now, due to the merge, also the head of the new aligned buddy.
+ */
+ CHECK((pg = alloc_domheap_pages(dom1, order1, 0)), "Alloc the order-1 pg");
+
+ /* Inspect the predecessor (pg is the tail of the unaligned buddy) */
+ EXPECT_FAIL_BEGIN();
+ /*
+ * After allocating two more pages, MFN 6 is free AND in-use:
+ *
+ * MFN 4 MFN 5 MFN 6 MFN 7
+ * +---------------+-----------------+-----------------+
+ * | offlined page | head page tail page |
+ * +---------------+-----------------+-----------------+----------------+
+ * | in-use page in-use page |
+ * +-----------------+----------------+
+ */
+ CHECK(page_aligned(pg - 1), "The buddy #%lu is not aligned to order-%d!",
+ mfn_x(page_to_mfn(pg - 1)), PFN_ORDER(pg - 1));
+ EXPECT_FAIL_END(1);
+
+ /* Allocate the remaining page; a clean heap should not hit BUG(). */
+ testcase_assert_expect_to_hit_bug = true;
+ /*
+ * As described above, if pg is the tail of an unaligned order-1 buddy,
+ * the unaligned buddy is still on the free list and this allocation will
+ * remove it from the free list and check alloc_heap_pages() checks the
+ * buddies to have a reference count of zero, and the already allocated
+ * page is returned as the tail of the unaligned buddy, causing the BUG().
+ *
+ * MFN 4 MFN 5 MFN 6 MFN 7
+ * +---------------+-----------------+-----------------+
+ * | offlined page | head page tail page | <- panic's Xen
+ * +---------------+-----------------+-----------------+----------------+
+ * | in-use page in-use page |
+ * +-----------------+----------------+
+ */
+ alloc_domheap_pages(dom1, order0, 0); /* Triggers BUG() */
+}
+
+int main(int argc, char *argv[])
+{
+ if ( !parse_args(argc, argv, "Test not growing unaligned buddies") )
+ return EXIT_FAILURE;
+
+ init_page_alloc_tests();
+ RUN_TESTCASE("TUBM", test_unaligned_buddy_merge, 4);
+
+ return test_complete();
+}
--
2.39.5
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |