[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e



On 30.11.2020 13:13, Hongyan Xia wrote:
> On Tue, 2020-08-18 at 18:16 +0200, Jan Beulich wrote:
>> On 18.08.2020 15:08, Julien Grall wrote:
>>> On 18/08/2020 12:30, Jan Beulich wrote:
>>>> On 18.08.2020 12:13, Julien Grall wrote:
>>>>> On 18/08/2020 09:49, Jan Beulich wrote:
>>>>>> On 13.08.2020 19:22, Julien Grall wrote:
>>>>>>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>>>>>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>>>>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>>>>>>> From: Wei Liu <wei.liu2@xxxxxxxxxx>
>>>>>>>>>>
>>>>>>>>>> Rewrite those functions to use the new APIs. Modify
>>>>>>>>>> its callers to
>>>>>>>>>> unmap
>>>>>>>>>> the pointer returned. Since alloc_xen_pagetable_new()
>>>>>>>>>> is almost
>>>>>>>>>> never
>>>>>>>>>> useful unless accompanied by page clearing and a
>>>>>>>>>> mapping, introduce
>>>>>>>>>> a
>>>>>>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>>>>>>
>>>>>>>>>> Note that the change of virt_to_xen_l1e() also
>>>>>>>>>> requires
>>>>>>>>>> vmap_to_mfn() to
>>>>>>>>>> unmap the page, which requires domain_page.h header
>>>>>>>>>> in vmap.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
>>>>>>>>>> Signed-off-by: Hongyan Xia <hongyxia@xxxxxxxxxx>
>>>>>>>>>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> Changed in v8:
>>>>>>>>>> - s/virtual address/linear address/.
>>>>>>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>>>>>>
>>>>>>>>> The justification for this should be recorded in the
>>>>>>>>> description. In
>>>>>>>>
>>>>>>>> Will do.
>>>>>>>>
>>>>>>>>> reply to v7 I did even suggest how to easily address
>>>>>>>>> the issue you
>>>>>>>>> did notice with large pages, as well as alternative
>>>>>>>>> behavior for
>>>>>>>>> vmap_to_mfn().
>>>>>>>>
>>>>>>>> One thing about adding SMALL_PAGES is that vmap is common
>>>>>>>> code and I am
>>>>>>>> not sure if the Arm side is happy with it.
>>>>>>>
>>>>>>> At the moment, Arm is only using small mapping but I plan
>>>>>>> to change that soon because we have regions that can be
>>>>>>> fairly big.
>>>>>>>
>>>>>>> Regardless that, the issue with vmap_to_mfn() is rather x86
>>>>>>> specific. So I don't particularly like the idea to expose
>>>>>>> such trick in common code.
>>>>>>>
>>>>>>> Even on x86, I think this is not the right approach. Such
>>>>>>> band-aid will impact the performance as, assuming
>>>>>>> superpages are used, it will take longer to map and add
>>>>>>> pressure on the TLBs.
>>>>>>>
>>>>>>> I am aware that superpages will be useful for LiveUpdate,
>>>>>>> but is there any use cases in upstream?
>>>>>>
>>>>>> Superpage use by vmalloc() is purely occasional: You'd have
>>>>>> to vmalloc()
>>>>>> 2Mb or more _and_ the page-wise allocation ought to return
>>>>>> 512
>>>>>> consecutive pages in the right order. Getting 512 consecutive
>>>>>> pages is
>>>>>> possible in practice, but with the page allocator allocating
>>>>>> top-down it
>>>>>> is very unlikely for them to be returned in increasing-sorted 
>>>>>> order.
>>>>>
>>>>> So your assumption here is vmap_to_mfn() can only be called on
>>>>> vmalloc-ed() area. While this may be the case in Xen today, the
>>>>> name clearly suggest it can be called on all vmap-ed region.
>>>>
>>>> No, I don't make this assumption, and I did spell this out in an
>>>> earlier
>>>> reply to Hongyan: Parties using vmap() on a sufficiently large
>>>> address
>>>> range with consecutive MFNs simply have to be aware that they may
>>>> not
>>>> call vmap_to_mfn().
>>>
>>> You make it sounds easy to be aware, however there are two
>>> implementations of vmap_to_mfn() (one per arch). Even looking at
>>> the x86 version, it is not obvious there is a restriction.
>>
>> I didn't mean to make it sound like this - I agree it's not an
>> obvious
>> restriction.
>>
>>> So I am a bit concerned of the "misuse" of the function. This could
>>> possibly be documented. Although, I am not entirely happy to
>>> restrict the use because of x86.
>>
>> Unless the underlying issue gets fixed, we need _some_ form of bodge.
>> I'm not really happy with the BUG_ON() as proposed by Hongyan. You're
>> not really happy with my first proposed alternative, and you didn't
>> comment on the 2nd one (still kept in context below). Not sure what
>> to do: Throw dice?
> 
> Actually I did not propose the BUG_ON() fix. I was just in favor of it
> when Jan presented it as a choice in v7.
> 
> The reason I am in favor of it is that even without it, the existing
> x86 code already BUG_ON() when vmap has a superpage anyway, so it's not
> like other alternatives behave any differently for superpages. I am
> also not sure about returning INVALID_MFN because if virt_to_xen_l1e()
> really returns NULL, then we are calling vmap_to_mfn() on a non-vmap
> address (not even populated) which frankly deserves at least ASSERT().
> So, I don't think BUG_ON() is a bad idea for now before vmap_to_mfn()
> supports superpages.
> 
> Of course, we could use MAP_SMALL_PAGES to avoid the whole problem, but
> Arm may not be happy. After a quick chat with Julien, how about having
> ARCH_VMAP_FLAGS and only small pages for x86?

Possibly, albeit this will then make it look less like a bodge and
more like we would want to keep it this way. How difficult would it
be to actually make the thing work with superpages? Is it more than
simply
(pseudocode, potentially needed locking omitted):

vmap_to_mfn(va) {
    pl1e = virt_to_xen_l1e(va);
    if ( pl1e )
        return l1e_get_mfn(*pl1e);
    pl2e = virt_to_xen_l2e(va);
    if ( pl2e )
        return l2e_get_mfn(*pl2e) + suitable_bits(va);
    return l3e_get_mfn(*virt_to_xen_l3e(va)) + suitable_bits(va);
}

(assuming virt_to_xen_l<N>e() would be returning NULL in such a case)?
Not very efficient, but not needed anywhere anyway - the sole user of
the construct is domain_page_map_to_mfn(), which maps only individual
pages. (An even better option would be to avoid the recurring walk of
the higher levels by using only virt_to_xen_l3e() and then doing the
remaining steps "by hand". This would at once allow to avoid the here
unwanted / unneeded checking for whether page tables need allocating.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.