[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 8/8] xen: allow up to 16383 cpus


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Mon, 6 May 2024 08:53:13 +0200
  • Authentication-results: smtp-out2.suse.de; none
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien@xxxxxxx>
  • Delivery-date: Mon, 06 May 2024 06:53:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.05.24 08:42, Jan Beulich wrote:
On 03.05.2024 21:07, Stefano Stabellini wrote:
On Fri, 3 May 2024, Julien Grall wrote:
Hi Stefano,

On 02/05/2024 19:13, Stefano Stabellini wrote:
On Mon, 29 Apr 2024, Julien Grall wrote:
Hi Juergen,

On 29/04/2024 12:28, Jürgen Groß wrote:
On 29.04.24 13:04, Julien Grall wrote:
Hi Juergen,

Sorry for the late reply.

On 29/04/2024 11:33, Juergen Gross wrote:
On 08.04.24 09:10, Jan Beulich wrote:
On 27.03.2024 16:22, Juergen Gross wrote:
With lock handling now allowing up to 16384 cpus (spinlocks can
handle
65535 cpus, rwlocks can handle 16384 cpus), raise the allowed
limit
for
the number of cpus to be configured to 16383.

The new limit is imposed by IOMMU_CMD_BUFFER_MAX_ENTRIES and
QINVAL_MAX_ENTRY_NR required to be larger than 2 *
CONFIG_NR_CPUS.

Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

Acked-by: Jan Beulich <jbeulich@xxxxxxxx>

I'd prefer this to also gain an Arm ack, though.

Any comment from Arm side?

Can you clarify what the new limits mean in term of (security)
support?
Are we now claiming that Xen will work perfectly fine on platforms
with up
to 16383?

If so, I can't comment for x86, but for Arm, I am doubtful that it
would
work without any (at least performance) issues. AFAIK, this is also an
untested configuration. In fact I would be surprised if Xen on Arm was
tested with more than a couple of hundreds cores (AFAICT the Ampere
CPUs
has 192 CPUs).

I think we should add a security support limit for the number of
physical
cpus similar to the memory support limit we already have in place.

For x86 I'd suggest 4096 cpus for security support (basically the limit
we
have with this patch), but I'm open for other suggestions, too.

I have no idea about any sensible limits for Arm32/Arm64.

I am not entirely. Bertrand, Michal, Stefano, should we use 192 (the
number of
CPUs from Ampere)?

I am OK with that. If we want to be a bit more future proof we could say
256 or 512.

Sorry, I don't follow your argument. A limit can be raised at time point in
the future. The question is more whether we are confident that Xen on Arm will
run well if a user has a platform with 256/512 pCPUs.

So are you saying that from Xen point of view, you are expecting no difference
between 256 and 512. And therefore you would be happy if to backport patches
if someone find differences (or even security issues) when using > 256 pCPUs?

It is difficult to be sure about anything that it is not regularly
tested. I am pretty sure someone in the community got Xen running on an
Ampere, so like you said 192 is a good number. However, that is not
regularly tested, so we don't have any regression checks in gitlab-ci or
OSSTest for it.

One approach would be to only support things regularly tested either by
OSSTest, Gitlab-ci, or also Xen community members. I am not sure what
would be the highest number with this way of thinking but likely no
more than 192, probably less. I don't know the CPU core count of the
biggest ARM machine in OSSTest.

Another approach is to support a "sensible" number: not something tested
but something we believe it should work. No regular testing. (In safety,
they only believe in things that are actually tested, so this would not
be OK. But this is security, not safety, just FYI.) With this approach,
we could round up the number to a limit we think it won't break. If 192
works, 256/512 should work? I don't know but couldn't think of something
that would break going from 192 to 256.

I would suggest to aim at sticking to power-of-2 values. There are still
some calculations in Xen which can  be translated to more efficient code
that way (mainly: using shifts rather than multiplications or a
combination of shifts and adds). Of course those calculations depend on
what people choose as actual values, but giving an upper bound being a
power of 2 may at least serve as a hint to them.

It depends on how strict we want to be on testing requirements. I am not
sure what approach was taken by x86 so far. I am OK either way.

The bumping of the limit here clearly is forward-looking for x86, i.e. is
unlikely to be even possible to test right now (except maybe when running
Xen itself virtualized). I actually think there need to be two separate
considerations: One is towards for how many CPUs Xen can be built (and
such a build can be validated on a much smaller system), while another is
to limit what is supported (in ./SUPPORT.md).

My suggestion would be to add the following to my patch:

- introducing the number of security supported physical cpus to SUPPORT.md
  (4096 for x86, 256 for Arm64 and Arm32)

- adding the new upper bound to CHANGELOG.md

In case I don't hear any objections I'll send it out tomorrow.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.