[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Mon, 6 May 2019 15:29:24 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= mQENBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAG0H0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT6JATkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPuQENBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAGJAR8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHf4kBrQQY AQgAIBYhBIUSZ3Lo9gSUpdCX97DendYovxMvBQJa3fDQAhsCAIEJELDendYovxMvdiAEGRYI AB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCWt3w0AAKCRCAXGG7T9hjvk2LAP99B/9FenK/ 1lfifxQmsoOrjbZtzCS6OKxPqOLHaY47BgEAqKKn36YAPpbk09d2GTVetoQJwiylx/Z9/mQI CUbQMg1pNQf9EjA1bNcMbnzJCgt0P9Q9wWCLwZa01SnQWFz8Z4HEaKldie+5bHBL5CzVBrLv 81tqX+/j95llpazzCXZW2sdNL3r8gXqrajSox7LR2rYDGdltAhQuISd2BHrbkQVEWD4hs7iV 1KQHe2uwXbKlguKPhk5ubZxqwsg/uIHw0qZDk+d0vxjTtO2JD5Jv/CeDgaBX4Emgp0NYs8IC UIyKXBtnzwiNv4cX9qKlz2Gyq9b+GdcLYZqMlIBjdCz0yJvgeb3WPNsCOanvbjelDhskx9gd 6YUUFFqgsLtrKpCNyy203a58g2WosU9k9H+LcheS37Ph2vMVTISMszW9W8gyORSgmw==
  • Cc: Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 06 May 2019 13:29:36 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 06/05/2019 15:14, Jan Beulich wrote:
>>>> On 06.05.19 at 14:23, <jgross@xxxxxxxx> wrote:
>> On 06/05/2019 13:58, Jan Beulich wrote:
>>>>>> On 06.05.19 at 12:20, <jgross@xxxxxxxx> wrote:
>>>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>>>> On 06.05.19 at 11:23, <jgross@xxxxxxxx> wrote:
>>>>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>>>>> Generally I think for_each_cpu() uses in __init functions are
>>>>>>> problematic, as they then require further code elsewhere to
>>>>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>>>>> notifiers is typically more appropriate.
>>>>>>
>>>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>>>> handled (hence the RFC status of the series).
>>>>>>
>>>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>>>> scheme as you suggested. Right now the current solution is much more
>>>>>> simple.
>>>>>
>>>>> I see (I did notice the cover letter remark, but managed to not
>>>>> honor it when writing the reply), but I'm unconvinced if incurring
>>>>> more code churn by not dealing with things the "dynamic" way
>>>>> right away is indeed the "more simple" (overall) solution.
>>>>
>>>> Especially with hotplug things are becoming more complicated: I'd like
>>>> to have the final version fall back to smaller granularities in case
>>>> e.g. the user has selected socket scheduling and two sockets have
>>>> different numbers of cores. With hotplug such a situation might be
>>>> discovered only with some domUs already running, so how should we
>>>> react in that case? Doing panic() is no option, so either we reject
>>>> onlining the additional socket, or we adapt by dynamically modifying the
>>>> scheduling granularity. Without that being discussed I don't think it
>>>> makes sense to put a lot effort into a solution which is going to be
>>>> rejected in the end.
>>>
>>> Hmm, where's the symmetry requirement coming from? Socket
>>> scheduling should mean as many vCPU-s on one socket as there
>>> are cores * threads; similarly core scheduling (number of threads).
>>> Statically partitioning domains would seem an intermediate step
>>> at best only anyway, as that requires (on average) leaving more
>>> resources (cores/threads) idle than with a dynamic partitioning
>>> model.
>>
>> And that is exactly the purpose of core/socket scheduling. How else
>> would it be possible (in future) to pass through the topology below
>> the scheduling granularity to the guest?
> 
> True. Albeit nevertheless an (at least) unfortunate limitation.
> 
>> And how should it be of any
>> use for fighting security issues due to side channel attacks?
> 
> From Xen's pov all is still fine afaict. It's the lack of (correct)
> topology exposure (as per above) which would make guest
> side mitigation impossible.
> 
>>> As to your specific question how to react: Since bringing online
>>> a full new socket implies bringing online all its cores / threads one
>>> by one anyway, a "too small" socket in your scheme would
>>> simply result in the socket remaining unused until "enough"
>>> cores/threads have appeared. Similarly the socket would go
>>> out of use as soon as one of its cores/threads gets offlined.
>>
>> Yes, this is a possible way to do it. It should be spelled out,
>> though.
>>
>>> Obviously this ends up problematic for the last usable socket.
>>
>> Yes, like today for the last cpu/thread.
> 
> Well, only kind of. It's quite expected that the last thread
> can't be offlined. I'd call it rather unexpected that a random
> thread on the last socket can't be offlined just because each
> other socket also has a single offline thread: There might
> still be hundreds of online threads in this case, after all.

You'd need to offline the related thread in all active guests. Otherwise
(from the guest's point of view) a cpu suddenly disappears.

> 
>>> But with the static partitioning you describe I also can't really
>>> see how "xen-hptool smt-disable" is going to work.
>>
>> It won't work. It just makes no sense to use it with core scheduling
>> active.
> 
> Why not? Disabling HT may be for purposes other than mitigating
> vulnerabilities like L1TF. And the system is in a symmetric state
> at the beginning and end of the entire operation; it's merely
> intermediate state which doesn't fit the expectations you set forth.

It is like bare metal: You can't physically unplug a single thread. This
is possible only for complete sockets.

It would theoretically be possible to have a test whether all guests
have the related cpus offlined in order to offline them in Xen. IMHO
this would be overkill: as an admin you have to decide whether you want
to use core scheduling or you want the ability to switch of SMT on the
fly.

You can still boot e.g. with sched-gran=socket and smt=off.

Another possibility would be to make sched-gran and SMT per cpupool.
In that case I'd like to those attributes static at creation time of
the cpupool, though.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.