Xen project Mailing List

Re: [Xen-devel] Converting heap page_infos to contiguous virtual

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 13 Jul 2016 22:06:23 +0100

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Delivery-date: Wed, 13 Jul 2016 21:06:30 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 13/07/2016 21:57, Boris Ostrovsky wrote: > On 07/13/2016 04:34 PM, Andrew Cooper wrote: >> On 13/07/2016 21:17, Boris Ostrovsky wrote: >>> On 07/13/2016 04:02 PM, Andrew Cooper wrote: >>>> On 13/07/16 20:44, Boris Ostrovsky wrote: >>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not >>>>> page-by-page). >>>>> >>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c) >>>>> pg = page_list_remove_head(&heap(node, zone, order) >>>>> >>>>> and then >>>>> >>>>> mfn_t mfn = >>>>> _mfn(page_to_mfn(pg)); >>>>> char *va = mfn_to_virt(mfn_x(mfn)); >>>>> memset(va, 0, 4096 * (1 << order)); >>>>> >>>>> >>>>> Would it be valid to this? >>>> In principle, yes. The frame_table is in order. >>>> >>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary. You >>>> need to map_domain_page() to get a mapping. >>> Right, but that would mean going page-by-page, which I want to avoid. >>> >>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it >>> imply that it maps this big a range contiguously (modulo PDX hole)? >> Your maths is correct, and yet you will end up with problems if you >> trust it. >> >> That is the magic mode for the idle and monitor pagetables. In the >> context of a 64bit PV guest, the cutoff is at 5TB, at which point you >> venture into the virtual address space reserved for guest kernel use. >> (It is rather depressing that the 64bit PV guest ABI is the factor >> limiting Xen's maximum RAM usage.) > I don't know whether it would make any difference but the pages that I am > talking about are not in use by any guest, they are free. (This question > is for scrubbing rewrite that I am working on. Which apparently you > figured out judged by what you are saying below) Being free is not relevant. It depends whether current is a 64bit PV guest or not. Even in the idle loop, we don't context switch away from current's pagetables. Realistically, you must at all times use map_domain_page() (or an alternative thereabouts), as the 5TB limit with 64bit PV guests turns into a 3.5TB limit depending on CONFIG_BIGMEM. > > >>>>> Do I need to account for the PDX hole? >>>> Jan is probably the best person to ask about this, but I am failure sure >>>> there are lurking dragons here. >>>> >>>> PDX compression is used to reduce the size of the frametable when there >>>> are large unused ranges of mfns. Without paying attention to the PDX >>>> shift, you don't know where the discontinuities lie. >>>> >>>> However, because the PDX shift is an aligned power of two, there are >>>> likely to be struct page_info*'s in the frame_table which don't point at >>>> real RAM, and won't have a virtual mapping even in the directmap. >>> So I would be OK with finding which mfn of my range points to beginning >>> of the hole and break the mfn range into two sections --- one below the >>> hole and one above. With hope that both ranges can be mapped >>> contiguously --- something that I don't know whether is true. >> If you have a struct page_info * in your hand, and know from the E820 >> where the next non RAM boundary is, I think you should be safe to clear >> memory over a contiguous range of the directmap. There shouldn't be any >> discontinuities over that range. > OK, I'll look at this, thanks. > > >> Be aware of memory_guard() though which does shoot holes in the >> directmap. However, only allocated pages should be guarded, so you >> should never be in the position of scrubbing pages with a missing >> mapping in the directmap. For RAM above the 5TB boundary, >> map_domain_page() will DTRT, but we might want to see about making a >> variant which can make mappings spanning more than 4k. > Maybe have it at least attempt to map a larger range, i.e. not try to > cover all corner cases. Something like map_domain_page_fast(order). map_domain_page() is fast if it can use the directmap. (However, it doesn't on a debug build to test the highmem logic.) I expect the exceedingly common case for RAM above the 5TB (or 3.5TB) boundary will be for it to already align on a 1GB boundary, at which point 2M or 1G superpages will work just fine for a temporary mapping. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.