[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 5/7] xen/setup: Transfer MFNs from non-RAM E820 entries and gaps to E820 RAM



When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.

The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000080800000 (usable)

The fun comes when a PV guest that is run with a system E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:

BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 000000000009e000 (usable)
 Xen: 000000000009ec00 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000020000000 (usable)
 Xen: 0000000020000000 - 0000000020200000 (reserved)
 Xen: 0000000020200000 - 0000000040000000 (usable)
 Xen: 0000000040000000 - 0000000040200000 (reserved)
 Xen: 0000000040200000 - 00000000bad80000 (usable)
 Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:

Freeing  20000-20200 pfn range: 512 pages freed
Freeing  40000-40200 pfn range: 512 pages freed
Freeing  bad80-badf4 pfn range: 116 pages freed
Freeing  badf6-bae7f pfn range: 137 pages freed
Freeing  bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory

Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtraced by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.

The 'xen_exchange_chunk' solves this by transfering the
MFNs that would have been freed to the E820_RAM entries that
are past the nr_pages by using the early_set_phys_to_machine
mechanism that allows the P2M tree to allocate new leafs during
early bootup.

It does that by copying the MFNs to the E820_RAM that has not
been used and setting the old PFNs to INVALID_P2M_ENTRY.

The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.

We will now get:

-Released 283999 pages of unused memory
+Exchanged 283999 pages
.. snip..
-Memory: 6487732k/9208688k available (5817k kernel code, 1136060k absent, 
1584896k reserved, 2900k data, 692k init)
+Memory: 6503888k/8072692k available (5817k kernel code, 1136060k absent, 
432744k reserved, 2900k data, 692k init)

which is more in line with classic XenOLinux.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 arch/x86/xen/setup.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 1ba8dff..2a12143 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -120,12 +120,89 @@ static unsigned long __init xen_release_chunk(unsigned 
long start,
        return len;
 }
 
+static unsigned long __init xen_exchange_chunk(unsigned long start_pfn,
+       unsigned long end_pfn, unsigned long nr_pages, unsigned long exchanged,
+       unsigned long *pages_left, const struct e820entry *list,
+       size_t map_size)
+{
+       const struct e820entry *entry;
+       unsigned int i;
+       unsigned long credits = (end_pfn - start_pfn) + *pages_left;
+       unsigned long done = 0;
+
+       for (i = 0, entry = list; i < map_size; i++, entry++) {
+               unsigned long s_pfn;
+               unsigned long e_pfn;
+               unsigned long pfn;
+               unsigned long dest_pfn;
+               long nr;
+
+               if (credits == 0)
+                       break;
+
+               if (entry->type != E820_RAM)
+                       continue;
+
+               e_pfn = PFN_UP(entry->addr + entry->size);
+
+               /* We only care about E820 _after_ the xen_start_info->nr_pages 
*/
+               if (e_pfn <= nr_pages)
+                       continue;
+
+               s_pfn = PFN_DOWN(entry->addr);
+               /* If the E820 falls within the nr_pages, we want to start
+                * at the nr_pages PFN (plus whatever we already had exchanged)
+                * If that would mean going past the E820 entry, skip it
+                */
+               if (s_pfn <= nr_pages) {
+                       nr = e_pfn - exchanged - nr_pages;
+                       dest_pfn = nr_pages + exchanged;
+               } else {
+                       nr = e_pfn - exchanged - s_pfn;
+                       dest_pfn = s_pfn + exchanged;
+               }
+               /* If we had filled this E820_RAM entry, go to the next one. */
+               if (nr <= 0)
+                       continue;
+
+               pr_debug("[%lx->%lx] (starting at %lx and have space for %ld 
pages) will move %ld pages from [%lx->%lx]\n",
+                        s_pfn, e_pfn, dest_pfn, nr, credits, start_pfn, 
end_pfn);
+
+               for (pfn = start_pfn; pfn < start_pfn + nr; pfn++) {
+                       unsigned long mfn = pfn_to_mfn(pfn);
+
+                       if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
+                               break;
+
+                       if (!early_set_phys_to_machine(dest_pfn, mfn))
+                               break;
+
+                       /* You would think we should do 
HYPERVISOR_update_va_mapping
+                        * but we don't need to as the hypervisor only sets up 
the
+                        * initial pagetables up to nr_pages, and we stick the 
MFNs
+                        * past that.
+                        */
+                       __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+                       ++dest_pfn;
+                       ++done;
+                       if (--credits == 0)
+                               break;
+               }
+       }
+       if (done)
+               printk(KERN_INFO "Transfered from %lx->%lx range %ld pages\n", 
start_pfn, end_pfn, done);
+       /* How many left on the next iteration */
+       *pages_left = credits;
+       return done;
+}
 static unsigned long __init xen_set_identity_and_release(
        const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
        phys_addr_t start = 0;
        unsigned long released = 0;
        unsigned long identity = 0;
+       unsigned long exchanged = 0;
+       unsigned long credits = 0;
        const struct e820entry *entry;
        int i;
 
@@ -151,17 +228,19 @@ static unsigned long __init xen_set_identity_and_release(
                                end_pfn = PFN_UP(entry->addr);
 
                        if (start_pfn < end_pfn) {
-                               if (start_pfn < nr_pages)
+                               exchanged += xen_exchange_chunk(start_pfn, 
end_pfn, nr_pages,
+                                               exchanged, &credits, list, 
map_size);
+                               if (start_pfn < nr_pages) {
                                        released += xen_release_chunk(
                                                start_pfn, min(end_pfn, 
nr_pages));
-
+                               }
                                identity += set_phys_range_identity(
                                        start_pfn, end_pfn);
                        }
                        start = end;
                }
        }
-
+       printk(KERN_INFO "Exchanged %lu pages\n", exchanged);
        printk(KERN_INFO "Released %lu pages of unused memory\n", released);
        printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
 
-- 
1.7.7.5


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.