John,
If you have a better design, share with us and I will be happy
to work with you. :-) I agree that xc_hvm_build.c does not have to be
modified, if memory.c is smart enough to scan all page_array
information. But one concern is that sometimes Xen tools really want
to create mapping at 4KB boundary instead of using large page. That requires
extra information passed from tools (e.g., xc_hvm_build.c) to
memory.c
-Wei
Wei,
I have been hacking at this, too, since I
am interested in trying 1GB pages to see what they can do. After I dug myself
into a hole, I restarted from the beginning and am trying a different approach
than modifying xc_hvm_build.c: modify populate_physmap()
to opportunistically allocate large pages, if possible. I just thought
I'd mention it.
John Byrne
I implemented a
preliminary version of HAP large page support. My testings showed
that 32bit PAE and 64bit worked well. Also I saw decent performance
improvement for certain benchmarks.
So before I go too far, I
send this patch to community for reviews/comments. This patch goes with
xen-unstable changeset 16281. I will redo it after collecting all
ideas.
Thanks,
-Wei
============
DESIGN
IDEAS:
1. Large page
requests
- xc_hvm_build.c requests
large page (2MB for now) while starting guests
- memory.c handles large
page requests. If it can not handle it, falls back to 4KB
pages.
2. P2M
table
- P2M table takes page
size order as a parameter; It builds P2M table (setting PSE bit, etc.)
according to page size.
- Other related functions
(such as p2m_audit()) handles the table based on page size
too.
- Page
split/merge
** Large page will be
split into 4KB page in P2M table if needed. For instance, if set_p2m_entry()
handles 4KB page but finds PSE/PRESENT bits are on, it will further split
large page to 4KB pages.
** There is NO merge from
4KB pages to large page. Since large page is only used at the very beginning,
guest_physmap_add(), this is OK for now.
3.
HAP
- To access the PSE bit,
L2 pages of P2M table is installed in linear mapping on
SH_LINEAR_PT_VIRT_START. We borrow this address space since it was not
used.
4. gfn_to_mfn translation
(P2M)
- gfn_to_mfn_foreign()
traverses P2M table and handles address translation correctly based on PSE
bit.
- gfn_to_mfn_current()
accesses SH_LINEAR_PT_VIRT_START to check PSE bit. If is on, we handle
translation using large page. Otherwise, it falls back to normal
RO_MPT_VIRT_START address space to access P2M L1 pages.
5. M2P
translation
- Same as before, M2P
translation still happens on 4KB level.
AREAS NEEDS
COMMENTS:
1. Large page for 32bit
mode
- 32bit use 4MB for large
page. This is very annoying for xc_hvm_build.c. I don't want to create another
4MB page_array for it.
- Because of this, this
area has not been tested very well. I expect changes soon.
2. Shadow
paging
- This implementation
will affect shadow mode, especially at xc_hvm_build.c and
memory.c.
- Where and how to avoid
affecting shadow?
3. Turn it
on/off
- Do we want to turn this
feature on/off through option (kernel option or anything
else)?
4. Other missing
areas?
===========