I have been hacking at this, too,  since I am interested in trying 1GB pages to see what they can do. After I dug myself into a hole, I restarted from the beginning and am trying a different approach than modifying xc_hvm_build.c: modify populate_physmap() to opportunistically allocate large pages, if possible. I just thought I'd mention it.
John Byrne

I implemented a preliminary version of HAP large page support. My testings showed that 32bit PAE and 64bit worked well. Also I saw decent performance improvement for certain benchmarks.
So before I go too far, I send this patch to community for reviews/comments. This patch goes with xen-unstable changeset 16281. I will redo it after collecting all ideas.
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls back to 4KB pages.
2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table (setting PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based on page size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits are on, it will further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is only used at the very beginning, guest_physmap_add(), this is OK for now.
3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear mapping on SH_LINEAR_PT_VIRT_START. We borrow this address space since it was not used.
4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address translation correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE bit. If is on, we handle translation using large page. Otherwise, it falls back to normal RO_MPT_VIRT_START address space to access P2M L1 pages.
5. M2P translation
- Same as before, M2P translation still happens on 4KB level.
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for xc_hvm_build.c. I don't want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect changes soon.
2. Shadow paging
- This implementation will affect shadow mode, especially at xc_hvm_build.c and memory.c.
- Where and how to avoid affecting shadow?
3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option or anything else)?
4. Other missing areas?
