[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 02/17] mm: introduce xvmalloc() et al and use for grant table allocations



Hi Jan,

On 26/11/2020 11:34, Jan Beulich wrote:
On 25.11.2020 20:48, Stefano Stabellini wrote:
On Wed, 25 Nov 2020, Jan Beulich wrote:
On 25.11.2020 13:15, Julien Grall wrote:
On 23/11/2020 14:23, Jan Beulich wrote:
All of the array allocations in grant_table_init() can exceed a page's
worth of memory, which xmalloc()-based interfaces aren't really suitable
for after boot. We also don't need any of these allocations to be
physically contiguous.. Introduce interfaces dynamically switching
between xmalloc() et al and vmalloc() et al, based on requested size,
and use them instead.

All the wrappers in the new header get cloned mostly verbatim from
xmalloc.h, with the sole adjustment to switch unsigned long to size_t
for sizes and to unsigned int for alignments.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
v2: Actually edit a copy-and-pasted comment in xvmalloc.h which was
      meant to be edited from the beginning.
---
I'm unconvinced of the mentioning of "physically contiguous" in the
comment at the top of the new header: I don't think xmalloc() provides
such a guarantee. Any use assuming so would look (latently) broken to
me.

I haven't had the chance to reply to the first version about this. So I
will reply here to avoid confusion.

I can at least spot one user in Arm that would use xmalloc() that way
(see the allocation of itt_addr in arch/arm/gic-v3-its.c).

And I surely wouldn't have spotted this, even if I had tried
to find "offenders", i.e. as said before not wanting to alter
the behavior of existing code (beyond the explicit changes
done here) was ...

AFAIK, the memory is for the sole purpose of the ITS and should not be
accessed by Xen. So I think we can replace by a new version of
alloc_domheap_pages().

However, I still question the usefulness of introducing yet another way
to allocate memory (we already have alloc_xenheap_pages(), xmalloc(),
alloc_domheap_pages(), vmap()) if you think users cannot rely on
xmalloc() to allocate memory physically contiguous.

... the reason to introduce a separate new interface. Plus of
course this parallels what Linux has.

It definitely makes more difficult to figure out when to use xmalloc()
vs xvalloc().

I don't see the difficulty:
- if you need physically contiguous memory, use alloc_xen*_pages(),
- if you know the allocation size is always no more than a page,
   use xmalloc(),

If that's then intention, then may I ask why xmalloc() is able to support multiple pages allocation?

Your assumption is Xen will always be built with the same page size across all the architecture. While Xen only works with 4KB pages today, Arm can support 16KB and 64KB. I have long term plan to add support for it.

So I don't think you can use the page size as a way to distinguish which one to use.


What if you need memory physically contiguous but not necessarily an
order of pages, such as for instance 5200 bytes?

This case is, I think, rare enough (in particular in Xen) that the
waste of space can be tolerated imo.

This is quite departure from:

commit b829a0ff5794ee5b0f96a0c872f6a4ed7b1007c7
Author: Jan Beulich <jbeulich@xxxxxxxx>
Date:   Thu Oct 13 10:03:43 2011 +0200

    xmalloc: return unused full pages on multi-page allocations

    Certain (boot time) allocations are relatively large (particularly when
    building with high NR_CPUS), but can also happen to be pretty far away
    from a power-of-two size. Utilize the page allocator's (other than
    Linux'es) capability of allowing to return space from higher-order
    allocations in smaller pieces to return the unused parts immediately.

    Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
    Acked-by: Keir Fraser <keir@xxxxxxx>

I am curious to know what changed...

Anyway, what you wrote is very server focused. On Arm, we have plan to run Xen on smaller hardware where wasting memory mean less usable RAM for guests.

The problem with using an order is the bigger the order is the more change you will waste space...

Allocating more than a page is fairly common on Arm, so we really want to reduce the amount of memory wasted.


If xmalloc can't do physically contiguous allocations, we need something
else that does physically contiguous allocations not only at page
granularity, right?

Well, we first need to settle on what guarantees xmalloc() is meant
to provide. It may be just me assuming it doesn't provide the same
ones which Linux'es kmalloc() makes. I'm first and foremost
judging by the comment near the top of xmalloc.h, which compares
with malloc() / free(), not kmalloc() / kfree().

The other issue is semantics. If xmalloc is unable to allocate more than
a page of contiguous memory, then it is identical to vmalloc from the
caller's point of view: both xmalloc and vmalloc return a virtual
address for an allocation that might not be physically contiguous.

Almost. vmalloc() puts guard pages around the allocation and
guarantees page alignment.

Maybe we should get rid of xmalloc entirely and improve the
implementation of vmalloc so that it falls back to xmalloc for
sub-page allocations. Which in fact is almost the same thing that you
did.

This would break callers assuming page alignment (and - shouldn't
be an issue in practice - granularity). If anything, as Julien
did suggest, we could modify xmalloc() accordingly, but then of
course making sure we also honor alignment requests beyond page
size.

Neither of these is the goal here, hence this "intermediate"
implementation, which is only almost "redundant".

- if you know the allocation size is always more than a page, use
   vmalloc(),
- otherwise use xvmalloc(). Exceptions may of course apply, i.e.
this is just a rule of thumb.

I would like to hear an opinion from the other maintainers.

Let's hope at least one will voice theirs.

If we take a step back, I think we only really need two memory
allocators:

1) one that allocates physically contiguous memory
2) one that allocates non-physically contiguous memory

That's it, right?

In addition to that, I understand it could be convient to have a little
wrapper that automatically chooses between 1) and 2) depending on
circumstances.

But if the circumstance is just size < PAGE_SIZE then I don't think we
need any convenience wrappers: we should just be able to call 2), which
is vmalloc, once we improve the vmalloc implementation.

Or do you see any reasons to keep the current vmalloc implementation as
is for sub-page allocations?

See my "Almost. ..." above.

As an aside, I also find it quite puzzling that in one of the rare
cases where I propose to clone an interface from Linux without much
deviation from their model, I get objections. It typically was the
other way around in the past ...

If we were really following Linux, then we would have two interfaces:
   - xmalloc() which is the same as kmalloc()
   - xvalloc() which is the same a kvalloc()

However, you seem to be the one objecting on the behavior of xmalloc().

I can't speak for Stefano, but I don't object on following Linux. Instead I am objecting on the growing number of way to allocate memory in Xen and that differ depending on the system_state.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.