[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7] x86/p2m: use large pages for MMIO mappings

On 09/02/16 08:42, Jan Beulich wrote:
>>>> On 08.02.16 at 19:04, <george.dunlap@xxxxxxxxxx> wrote:
>> First -- and this isn't a blocker, but just a question (and sorry if
>> it's been answered before) -- why have the "0 means I did it all, <nr
>> means I did it partially"?  Why not just return the number of entries
>> actually handled?
> Because I view zero as the conventional "success" indicator. No
> other deeper reason.

For certain classes of functionality, yes.  But for other classes --
where the caller will request N but the OS / hypervisor / whatever may
do some other number fewer than N, then the convention is to always pass
the number of items completed.  read() and write() system calls are like
this, as are most of the XENMEM operations (e.g.,

Since this domctl looks a lot like some of those XENMEM operations,
wouldn't it make more sense to follow that convention, and return the
number of extents actually allocated?

>>> @@ -949,14 +965,21 @@ static int set_typed_p2m_entry(struct do
>>>  static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>                                   mfn_t mfn)
>>>  {
>>> -    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign,
>>> +    return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
>>>                                 p2m_get_hostp2m(d)->default_access);
>>>  }
>>>  int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>> -                       p2m_access_t access)
>>> +                       unsigned int order, p2m_access_t access)
>>>  {
>>> -    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>> +    if ( order &&
>>> +         rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>> +                                 mfn_x(mfn) + (1UL << order) - 1) &&
>>> +         !rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
>>> +                                  mfn_x(mfn) + (1UL << order) - 1) )
>>> +        return order;
>> What is this about?  Isn't set_mmio_p2m_entry() now supposed to have a
>> calling convention like clear_mmio_p2m_entry(), where it returns
>> recommended_order+1 if you should retry with order, and this value
>> (recommended_order) is guaranteed to be strictly less than what was
>> passed in?
>> Did you mean to return PAGE_ORDER_4k+1 here, perhaps?  (Or (order-9)+1?)
> No. This is catering for callers of both {,ept_}p2m_type_to_flags()
> not easily being in the position of requesting further page splitting.
> Hence rather than complicate the code there, it is here where we
> make sure that the r/o enforcement won't be missed.

I wasn't asking what the check is about; I'm asking about the return
value.  Did you really mean that if someone passed in order 9, that it
would say "Please try with order 8 instead"?  I'm asking because I would
have expected that to be "Please try with order 0 instead".

(Although my question wasn't clear because I thought it was returning
the *same* value, not one less.  A brief comment like, "Request trying
again with order-1" would be helpful if we end up retaining this return

I'm still becoming somewhat familiar with the p2m code, but it looks to
me like ept_set_entry() at least doesn't tolerate orders that aren't
multiples of EPT_TABLE_ORDER.

This conditional will only actually do anything under the following

a. the requested gfn order is 9 (since we're not doing 1G pages yet)

b. at order 9, the mfn range partially overlaps mmio_ro_ranges

c. the p2m entry for that gfn is order 9 (or larger)

(Re c: i.e., if the order in the p2m entry is already 0, then
set_typed_p2m_entry() will already request a 4K page, making this entire
conditional redundant.)

Let's also suppose for simplicity that:

d. at order 8, the mfn range does *not* partially overlap mmio_ro_ranges

Under those conditions, it looks to me like the following will happen:
1. map_mmio_regions() will call set_mmio_p2m_entry() with order 9
2. set_mmio_p2m_entry() will return 9 (requesting using order 8)
3. map_mmio_regions() will call set_mmio_p2m_entry() with order 8
4. set_mmio_p2m_entry() will call set_typed_p2m_entry() with order 8
5. set_typed_p2m_entry() will read the current p2m entry of >= 9 and
continue (since 8 <= 9)
6. set_typed_p2m_entry() will call p2m_set_entry() with an order of 8
7. ept_set_entry() will return -EINVAL, since order % EPT_TABLE_ORDER != 0.

Am I missing something?

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(struct
>>>              entry->r = entry->x = 1;
>>>              entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
>>>                                                      entry->mfn);
>>> +            ASSERT(entry->w || !is_epte_superpage(entry));
>> What is this about?  Single-page epte entries are allowed to be either
>> read-only or read-write, but superpage entries have to have write
>> permission?
> Yes, due to the lack of an "order" input here, and (as mentioned
> above) the difficulties of this function's callers otherwise needing
> to be able to deal with further page splitting directives (coming
> out of this function).
> But now that you mention this I think you're right further up:
> That code as it is now indeed seems to put things up for the
> ASSERT() here to actually be triggerable (if there would be a
> large enough r/o MMIO region). So it looks like code further
> up should indeed use PAGE_ORDER_4K+1 here, as you suggest
> (which would then also address one of Andrew's concerns).

You could either get rid of the "!rangeset_contains_range()" in
set_mmio_p2m_entry() (not allowing >4k orders even if the entire range
fits), or you could change these assertions to check that either the
entire range is contained or that it doesn't overlap (essentially the
inverse of the check at set_mmio_p2m_entry()).

I don't immediately see any reason why you couldn't use superpages if
the entire mfn range fits within mmio_ro_ranges.  (Although if you have
reason to believe that these almost never contain superpages, it would
certainly make the code simpler to just always use 4k pages in case of
an overlap.)


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.