[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Converting heap page_infos to contiguous virtual



On 13/07/2016 21:57, Boris Ostrovsky wrote:
> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>> page-by-page).
>>>>>
>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>
>>>>> and then
>>>>>
>>>>>     mfn_t mfn =
>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>
>>>>>
>>>>> Would it be valid to this?
>>>> In principle, yes.  The frame_table is in order.
>>>>
>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>> need to map_domain_page() to get a mapping.
>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>
>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>> Your maths is correct, and yet you will end up with problems if you
>> trust it.
>>
>> That is the magic mode for the idle and monitor pagetables.  In the
>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>> venture into the virtual address space reserved for guest kernel use. 
>> (It is rather depressing that the 64bit PV guest ABI is the factor
>> limiting Xen's maximum RAM usage.)
> I don't know whether it would make any difference but the pages that I am
> talking about are not in use by any guest, they are free. (This question
> is for scrubbing rewrite that I am working on. Which apparently you
> figured out judged by what you are saying below)

Being free is not relevant.  It depends whether current is a 64bit PV
guest or not.  Even in the idle loop, we don't context switch away from
current's pagetables.

Realistically, you must at all times use map_domain_page() (or an
alternative thereabouts), as the 5TB limit with 64bit PV guests turns
into a 3.5TB limit depending on CONFIG_BIGMEM.

>
>
>>>>>  Do I need to account for the PDX hole?
>>>> Jan is probably the best person to ask about this, but I am failure sure
>>>> there are lurking dragons here.
>>>>
>>>> PDX compression is used to reduce the size of the frametable when there
>>>> are large unused ranges of mfns.  Without paying attention to the PDX
>>>> shift, you don't know where the discontinuities lie.
>>>>
>>>> However, because the PDX shift is an aligned power of two, there are
>>>> likely to be struct page_info*'s in the frame_table which don't point at
>>>> real RAM, and won't have a virtual mapping even in the directmap.
>>> So I would be OK with finding which mfn of my range points to beginning
>>> of the hole and break the mfn range into two sections --- one below the
>>> hole and one above. With hope that both ranges can be mapped
>>> contiguously --- something that I don't know whether is true.
>> If you have a struct page_info * in your hand, and know from the E820
>> where the next non RAM boundary is, I think you should be safe to clear
>> memory over a contiguous range of the directmap.  There shouldn't be any
>> discontinuities over that range.
> OK, I'll look at this, thanks.
>
>
>> Be aware of memory_guard() though which does shoot holes in the
>> directmap.  However, only allocated pages should be guarded, so you
>> should never be in the position of scrubbing pages with a missing
>> mapping in the directmap.  For RAM above the 5TB boundary,
>> map_domain_page() will DTRT, but we might want to see about making a
>> variant which can make mappings spanning more than 4k.
> Maybe have it at least attempt to map a larger range, i.e. not try to
> cover all corner cases. Something like map_domain_page_fast(order).

map_domain_page() is fast if it can use the directmap.  (However, it
doesn't on a debug build to test the highmem logic.)

I expect the exceedingly common case for RAM above the 5TB (or 3.5TB)
boundary will be for it to already align on a 1GB boundary, at which
point 2M or 1G superpages will work just fine for a temporary mapping.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.