Xen project Mailing List

Re: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item

From: Juergen Gross <jgross@xxxxxxxx>

Date: Fri, 17 May 2019 09:48:53 +0200

Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= mQENBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAG0H0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT6JATkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPuQENBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAGJAR8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHf4kBrQQY AQgAIBYhBIUSZ3Lo9gSUpdCX97DendYovxMvBQJa3fDQAhsCAIEJELDendYovxMvdiAEGRYI AB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCWt3w0AAKCRCAXGG7T9hjvk2LAP99B/9FenK/ 1lfifxQmsoOrjbZtzCS6OKxPqOLHaY47BgEAqKKn36YAPpbk09d2GTVetoQJwiylx/Z9/mQI CUbQMg1pNQf9EjA1bNcMbnzJCgt0P9Q9wWCLwZa01SnQWFz8Z4HEaKldie+5bHBL5CzVBrLv 81tqX+/j95llpazzCXZW2sdNL3r8gXqrajSox7LR2rYDGdltAhQuISd2BHrbkQVEWD4hs7iV 1KQHe2uwXbKlguKPhk5ubZxqwsg/uIHw0qZDk+d0vxjTtO2JD5Jv/CeDgaBX4Emgp0NYs8IC UIyKXBtnzwiNv4cX9qKlz2Gyq9b+GdcLYZqMlIBjdCz0yJvgeb3WPNsCOanvbjelDhskx9gd 6YUUFFqgsLtrKpCNyy203a58g2WosU9k9H+LcheS37Ph2vMVTISMszW9W8gyORSgmw==

Cc: Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 17 May 2019 07:49:07 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 17/05/2019 08:57, Jan Beulich wrote: >>>> On 17.05.19 at 07:13, <jgross@xxxxxxxx> wrote: >> On 16/05/2019 16:41, Jan Beulich wrote: >>>>>> On 16.05.19 at 15:51, <jgross@xxxxxxxx> wrote: >>>> On 16/05/2019 15:05, Jan Beulich wrote: >>>>>>>> On 06.05.19 at 08:56, <jgross@xxxxxxxx> wrote: >>>>>> --- a/xen/arch/x86/domain.c >>>>>> +++ b/xen/arch/x86/domain.c >>>>>> @@ -154,6 +154,24 @@ static void idle_loop(void) >>>>>> } >>>>>> } >>>>>> >>>>>> +/* >>>>>> + * Idle loop for siblings of active schedule items. >>>>>> + * We don't do any standard idle work like tasklets, page scrubbing or >>>>>> + * livepatching. >>>>>> + * Use default_idle() in order to simulate v->is_urgent. >>>>> >>>>> I guess I'm missing a part of the description which explains all this: >>>>> What's wrong with doing scrubbing work, for example? Why is >>>>> doing tasklet work not okay, but softirqs are? What is the deal with >>>>> v->is_urgent, i.e. what justifies not entering a decent power >>>>> saving mode here on Intel, but doing so on AMD? >>>> >>>> One of the reasons for using core scheduling is to avoid running vcpus >>>> of different domains on the same core in order to minimize the chances >>>> for side channel attacks to data of other domains. Not allowing >>>> scrubbing or tasklets here is due to avoid accessing data of other >>>> domains. >>> >>> So how is running softirqs okay then? And how is scrubbing accessing >>> other domains' data? >> >> Right now I'm not sure whether it is a good idea to block any softirqs. >> We definitely need to process scheduling requests and I believe RCU and >> tasklets, too. The tlbflush one should be uncritical, so timers is the >> remaining one which might be questionable. This can be fine-tuned later >> IMO e.g. by defining a softirq mask of critical softirqs to block and >> eventually splitting up e.g. timer and tasklet softirqs into critical >> and uncritical ones. > > Well, okay, but please add an abridged version of this to the patch > description then. Okay. > >> Scrubbing will probably pull the cache lines of the dirty pages into >> the L1 cache of the cpu. For me this sounds problematic. In case we >> are fine to do scrubbing as there is no risk associated I'm fine to add >> it back in. > > Well, of course there's going to be a brief period of time where > a cache line will be present in CPU internal buffers (it's not just the > cache after all, as we've learned with XSA-297). So I can certainly > buy that when using core granularity you don't want to scrub on > the other thread. But what about the socket granularity case? > Scrubbing on fully idle cores should still be fine, I would think. I think this would depend on the reason for selecting socket scheduling. I'd at least would want to have a way to select that as I could think of e.g. L3-cache side channel attacks, too. So maybe I could add a patch on top for adding a sub-option to the sched-gran parameter which will allow (or disallow?) scrubbing on idle cores or threads. > >>>> As with core scheduling we can be sure the other thread is active >>>> (otherwise we would schedule the idle item) and hoping for saving power >>>> by using mwait is moot. >>> >>> Saving power may be indirect, by the CPU re-arranging >>> resource assignment between threads when one goes idle. >>> I have no idea whether they do this when entering C1, or >>> only when entering deeper C states. >> >> SDM Vol. 3 chapter 8.10.1 "HLT instruction": >> >> "Here shared resources that were being used by the halted logical >> processor become available to active logical processors, allowing them >> to execute at greater efficiency." > > To be honest, this is to broad/generic a statement to fully > trust it, judging from other areas of the SDM. And then, as > per above, what about the socket granularity case? Putting > entirely idle cores to sleep is surely worthwhile? Yes, I assume it is. OTOH this might affect context switches badly as the reaction time for the coordinated switch will rise. Maybe a good reason for another sub-option? >>> And anyway - I'm still none the wiser as to the v->is_urgent >>> relationship. >> >> With v->is_urgent set today's idle loop will drop into default_idle(). >> I can remove this sentence in case it is just confusing. > > I'd prefer if the connection would become more obvious. One > needs to go from ->is_urgent via ->urgent_count to > sched_has_urgent_vcpu() to find where the described > behavior really lives. > > What's worse though: This won't work as intended on AMD > at all. I don't think it's correct to fall back to default_idle() in > this case. Instead sched_has_urgent_vcpu() returning true > should amount to the same effect as max_cstate being set > to 1. There's > (a) no reason not to use MWAIT on Intel CPUs in this case, > if MWAIT can enter C1, and > (b) a strong need to use MWAIT on (at least) AMD Fam17, > or else it won't be C1 that gets entered. > I'll see about making a patch in due course. Thanks. Would you mind doing it in a way that the caller can specify max_cstate? This would remove the need to call sched_has_urgent_vcpu() deep down the idle handling and I could re-use it for my purpose. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.