Xen project Mailing List

Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

From: Jürgen Groß <jgross@xxxxxxxx>

Date: Tue, 2 Jun 2020 13:23:05 +0200

Cc: Anthony Perard <anthony.perard@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 02 Jun 2020 11:23:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.06.20 13:12, Jan Beulich wrote:

On 02.06.2020 13:06, Jürgen Groß wrote:

On 06.04.20 14:09, Jan Beulich wrote:

On 06.04.2020 13:54, Jürgen Groß wrote:

On 06.04.20 13:11, Jan Beulich wrote:

On 06.04.2020 13:00, Ian Jackson wrote:

Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels 
dependant on vcpus"):

There are no correlation between event channels and vCPUs. The number of
event channels only depends on the number of frontend you have in your
guest. So...

Hi Ian,

On 06/04/2020 11:47, Ian Jackson wrote:

If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.


... 1023 event channels is going to be plenty enough for most of the use
cases.


OK, thanks for the quick reply.

So, Jürgen, I think everyone will be happy with this:


I don't think I will be - my prior comment still holds on there not
being any grounds to use a specific OS kernel's (and to be precise
a specific OS kernel version's) requirements for determining
defaults. If there was to be such a dependency, then OS kernel
[variant] should be part of the inputs to such a (set of) formula(s).


IMO this kind of trying to be perfect will completely block a sane
heuristic for being able to boot large guests at all.


This isn't about being perfect - I'm suggesting to leave the
default alone, not to improve the calculation, not the least
because I've been implying ...

The patch isn't about to find an as stringent as possible upper
boundary for huge guests, but a sane value being able to boot most of
those.

And how should Xen know the OS kernel needs exactly after all?


... the answer of "It can#t" to this question.

And it is not that we talking about megabytes of additional memory. A
guest with 256 vcpus will just be able to use additional 36 memory
pages. The maximum non-PV domain (the probably only relevant case
of another OS than Linux being used) with 128 vcpus would "waste"
32 kB. In case the guest misbehaves.


Any extra page counts, or else - where do you draw the line? Any
single page may decide between Xen (not) being out of memory,
and hence also not being able to fulfill certain other requests.

The alternative would be to do nothing and having to let the user
experience a somewhat cryptic guest crash. He could google for a
possible solution which would probably end in a rather high static
limit resulting in wasting even more memory.


I realize this. Otoh more people running into this will improve
the chances of later ones finding useful suggestions. Of course
there's also nothing wrong with trying to make the error less
cryptic.


Reviving this discussion.

I strongly disagree with your reasoning.

Rejecting to modify tools defaults for large guests to make them boot
is a bad move IMO. We are driving more people away from Xen this way.

The fear of a misbehaving guest of that size to use a few additional
pages on a machine with at least 100 cpus is fine from the academical
point of view, but should not be weighed higher than the usability
aspect in this case IMO.


Very simple question then: Where do you draw the boundary if you don't
want this to be a pure "is permitted" or "is not permitted" underlying
rule? If we had a model where _all_ resources consumed by a guest were
accounted against its tool stack requested allocation, things would be
easier.

I'd say it should be allowed in case the additional resource use is much smaller than the already used implicit resources for such a guest (e.g. less than an additional 1% of implicitly used memory). In cases like this, where a very small subset of guests is affected, and the additional need of resources will apply only in very extreme cases (I'm considering this case as extreme, as only non-Linux guests with huge numbers of vcpus _and_ which are misbehaving will need additional resources) I'd even accept higher margins like 5%. Juergen

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.