[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 09/15] argo: implement the sendv op; evtchn: expose send_guest_global_virq



>>> On 10.01.19 at 13:40, <royger@xxxxxxxxxxx> wrote:
> On Thu, Jan 10, 2019 at 1:13 PM Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>
>> >>> On 10.01.19 at 13:01, <royger@xxxxxxxxxxx> wrote:
>> > On Thu, Jan 10, 2019 at 4:10 AM Christopher Clark 
> <christopher.w.clark@xxxxxxxxx> wrote:
>> >>
>> >> The second reason is about avoiding exposing the Xen virtual memory
>> >> allocator directly to frequent guest-supplied size requests for
>> >> contiguous regions (of up to 16GB).
>> >
>> > As said in another reply, I'm not sure allowing 16GB rings is safe.
>> > The amount of internal memory required to track such rings is not
>> > trivial given the arrays to store the mfns, the pages, and the virtual
>> > mappings.
>> >
>> >> With single-page allocations to
>> >> build a ring, fragmentation is not a problem, and mischief by a guest
>> >> seems difficult.
>> >
>> > Hm, there's still a lot of big dynamic memory allocations in order to
>> > support a 16GB ring, which makes me think that virtual address space
>> > is not the only problem if you allow 16GB rings.
>> >
>> >> Changing it to issue requests for contiguous regions,
>> >> with variable ring sizes up to the maximum of 16GB, it seems like
>> >> significant fragmentation may be achievable. I don't know the
>> >> practical impact of that but it seems worth avoiding. Are the other
>> >> users of __vmap (or vmap) for multi-gigabyte regions only either
>> >> boot-time, infrequent operations (livepatch), or for actions by
>> >> privileged (ie. somewhat trusted) domains (ioremap), or is it already
>> >> a frequent operation somewhere else?
>> >
>> > I haven't checked, but I would be quite surprised to find any vmap
>> > usage with such size (16GB). Maybe someone more familiar with the mm
>> > subsystem can provide some insight here.
>>
>> And indeed the vmap range reserved in VA space is just 64GB (on
>> x86) at present.
>>
>> >> Given the context above, and Jason's simplification to the
>> >> memcpy_to_guest_ring function, plus the imminent merge freeze
>> >> deadline, and the understanding that this loop and the related data
>> >> structures supporting it have been tested and are working, would it be
>> >> acceptable to omit making this contiguous mapping change from this
>> >> current series?
>> >
>> > My opinion would be to just use vmap if it works, because that IMO
>> > greatly simplifies the code by being able to have the whole ring
>> > mapped at all the time. It would remove the iteration to copy
>> > requests, and remove the usage of ring_map_page everywhere. That would
>> > be my recommendation code-wise, but as said above someone more
>> > familiar with the mm subsystem might have other opinion's about how to
>> > deal with accesses to 16GB of guest memory, and indeed your iterative
>> > solution might be the best approach.
>>
>> No-one can allocate 16GB physically contiguous memory.
> 
> Right, my question/comment was whether it would make sense to limit
> the size of the argos ring to something smaller and then use vmap to
> map the whole ring in contiguous virtual space in order to ease
> accesses.

Whether you vmap() the ring in (page sized) pieces or in one blob is,
for the purpose of the order of magnitude of VA space consumption,
not overly relevant: You can't map more than at most three such
gigantic rings anyway with the current VA layout. (In practice
mapping individual pages would halve the effectively usable VA
space, due to the guard pages inserted between regions.) IOW -
the ring size should be bounded at a lower value anyway imo.

> TBH, I'm not sure virtual address space is the only issue if argos
> allows 16GB rings to be used. 16GB rings will consume a non-trivial
> amount of memory for the internal argos state tracking structures
> AFAICT.

Fully agree. It has taken us ages to eliminate all runtime
allocations of order > 0, and it looks like we'd be gaining some
back here. I consider this tolerable as long as the feature is
experimental, but it would need fixing for it to become fully
supported. Christopher - annotating such code with fixme
comments right away helps later spotting (and addressing) them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.