[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Fri, 17 May 2019 09:48:53 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= mQENBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAG0H0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT6JATkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPuQENBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAGJAR8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHf4kBrQQY AQgAIBYhBIUSZ3Lo9gSUpdCX97DendYovxMvBQJa3fDQAhsCAIEJELDendYovxMvdiAEGRYI AB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCWt3w0AAKCRCAXGG7T9hjvk2LAP99B/9FenK/ 1lfifxQmsoOrjbZtzCS6OKxPqOLHaY47BgEAqKKn36YAPpbk09d2GTVetoQJwiylx/Z9/mQI CUbQMg1pNQf9EjA1bNcMbnzJCgt0P9Q9wWCLwZa01SnQWFz8Z4HEaKldie+5bHBL5CzVBrLv 81tqX+/j95llpazzCXZW2sdNL3r8gXqrajSox7LR2rYDGdltAhQuISd2BHrbkQVEWD4hs7iV 1KQHe2uwXbKlguKPhk5ubZxqwsg/uIHw0qZDk+d0vxjTtO2JD5Jv/CeDgaBX4Emgp0NYs8IC UIyKXBtnzwiNv4cX9qKlz2Gyq9b+GdcLYZqMlIBjdCz0yJvgeb3WPNsCOanvbjelDhskx9gd 6YUUFFqgsLtrKpCNyy203a58g2WosU9k9H+LcheS37Ph2vMVTISMszW9W8gyORSgmw==
  • Cc: Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 17 May 2019 07:49:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 17/05/2019 08:57, Jan Beulich wrote:
>>>> On 17.05.19 at 07:13, <jgross@xxxxxxxx> wrote:
>> On 16/05/2019 16:41, Jan Beulich wrote:
>>>>>> On 16.05.19 at 15:51, <jgross@xxxxxxxx> wrote:
>>>> On 16/05/2019 15:05, Jan Beulich wrote:
>>>>>>>> On 06.05.19 at 08:56, <jgross@xxxxxxxx> wrote:
>>>>>> --- a/xen/arch/x86/domain.c
>>>>>> +++ b/xen/arch/x86/domain.c
>>>>>> @@ -154,6 +154,24 @@ static void idle_loop(void)
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> +/*
>>>>>> + * Idle loop for siblings of active schedule items.
>>>>>> + * We don't do any standard idle work like tasklets, page scrubbing or
>>>>>> + * livepatching.
>>>>>> + * Use default_idle() in order to simulate v->is_urgent.
>>>>>
>>>>> I guess I'm missing a part of the description which explains all this:
>>>>> What's wrong with doing scrubbing work, for example? Why is
>>>>> doing tasklet work not okay, but softirqs are? What is the deal with
>>>>> v->is_urgent, i.e. what justifies not entering a decent power
>>>>> saving mode here on Intel, but doing so on AMD?
>>>>
>>>> One of the reasons for using core scheduling is to avoid running vcpus
>>>> of different domains on the same core in order to minimize the chances
>>>> for side channel attacks to data of other domains. Not allowing
>>>> scrubbing or tasklets here is due to avoid accessing data of other
>>>> domains.
>>>
>>> So how is running softirqs okay then? And how is scrubbing accessing
>>> other domains' data?
>>
>> Right now I'm not sure whether it is a good idea to block any softirqs.
>> We definitely need to process scheduling requests and I believe RCU and
>> tasklets, too. The tlbflush one should be uncritical, so timers is the
>> remaining one which might be questionable. This can be fine-tuned later
>> IMO e.g. by defining a softirq mask of critical softirqs to block and
>> eventually splitting up e.g. timer and tasklet softirqs into critical
>> and uncritical ones.
> 
> Well, okay, but please add an abridged version of this to the patch
> description then.

Okay.

> 
>> Scrubbing will probably pull the cache lines of the dirty pages into
>> the L1 cache of the cpu. For me this sounds problematic. In case we
>> are fine to do scrubbing as there is no risk associated I'm fine to add
>> it back in.
> 
> Well, of course there's going to be a brief period of time where
> a cache line will be present in CPU internal buffers (it's not just the
> cache after all, as we've learned with XSA-297). So I can certainly
> buy that when using core granularity you don't want to scrub on
> the other thread. But what about the socket granularity case?
> Scrubbing on fully idle cores should still be fine, I would think.

I think this would depend on the reason for selecting socket scheduling.
I'd at least would want to have a way to select that as I could think of
e.g. L3-cache side channel attacks, too.

So maybe I could add a patch on top for adding a sub-option to the
sched-gran parameter which will allow (or disallow?) scrubbing on idle
cores or threads.

> 
>>>> As with core scheduling we can be sure the other thread is active
>>>> (otherwise we would schedule the idle item) and hoping for saving power
>>>> by using mwait is moot.
>>>
>>> Saving power may be indirect, by the CPU re-arranging
>>> resource assignment between threads when one goes idle.
>>> I have no idea whether they do this when entering C1, or
>>> only when entering deeper C states.
>>
>> SDM Vol. 3 chapter 8.10.1 "HLT instruction":
>>
>> "Here shared resources that were being used by the halted logical
>> processor become available to active logical processors, allowing them
>> to execute at greater efficiency."
> 
> To be honest, this is to broad/generic a statement to fully
> trust it, judging from other areas of the SDM. And then, as
> per above, what about the socket granularity case? Putting
> entirely idle cores to sleep is surely worthwhile?

Yes, I assume it is. OTOH this might affect context switches badly
as the reaction time for the coordinated switch will rise. Maybe a
good reason for another sub-option?

>>> And anyway - I'm still none the wiser as to the v->is_urgent
>>> relationship.
>>
>> With v->is_urgent set today's idle loop will drop into default_idle().
>> I can remove this sentence in case it is just confusing.
> 
> I'd prefer if the connection would become more obvious. One
> needs to go from ->is_urgent via ->urgent_count to
> sched_has_urgent_vcpu() to find where the described
> behavior really lives.
> 
> What's worse though: This won't work as intended on AMD
> at all. I don't think it's correct to fall back to default_idle() in
> this case. Instead sched_has_urgent_vcpu() returning true
> should amount to the same effect as max_cstate being set
> to 1. There's
> (a) no reason not to use MWAIT on Intel CPUs in this case,
> if MWAIT can enter C1, and
> (b) a strong need to use MWAIT on (at least) AMD Fam17,
> or else it won't be C1 that gets entered.
> I'll see about making a patch in due course.

Thanks. Would you mind doing it in a way that the caller can specify
max_cstate? This would remove the need to call sched_has_urgent_vcpu()
deep down the idle handling and I could re-use it for my purpose.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.