Xen project Mailing List

Re: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 17 May 2019 00:57:18 -0600

Cc: Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 17 May 2019 06:57:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 17.05.19 at 07:13, <jgross@xxxxxxxx> wrote: > On 16/05/2019 16:41, Jan Beulich wrote: >>>>> On 16.05.19 at 15:51, <jgross@xxxxxxxx> wrote: >>> On 16/05/2019 15:05, Jan Beulich wrote: >>>>>>> On 06.05.19 at 08:56, <jgross@xxxxxxxx> wrote: >>>>> --- a/xen/arch/x86/domain.c >>>>> +++ b/xen/arch/x86/domain.c >>>>> @@ -154,6 +154,24 @@ static void idle_loop(void) >>>>> } >>>>> } >>>>> >>>>> +/* >>>>> + * Idle loop for siblings of active schedule items. >>>>> + * We don't do any standard idle work like tasklets, page scrubbing or >>>>> + * livepatching. >>>>> + * Use default_idle() in order to simulate v->is_urgent. >>>> >>>> I guess I'm missing a part of the description which explains all this: >>>> What's wrong with doing scrubbing work, for example? Why is >>>> doing tasklet work not okay, but softirqs are? What is the deal with >>>> v->is_urgent, i.e. what justifies not entering a decent power >>>> saving mode here on Intel, but doing so on AMD? >>> >>> One of the reasons for using core scheduling is to avoid running vcpus >>> of different domains on the same core in order to minimize the chances >>> for side channel attacks to data of other domains. Not allowing >>> scrubbing or tasklets here is due to avoid accessing data of other >>> domains. >> >> So how is running softirqs okay then? And how is scrubbing accessing >> other domains' data? > > Right now I'm not sure whether it is a good idea to block any softirqs. > We definitely need to process scheduling requests and I believe RCU and > tasklets, too. The tlbflush one should be uncritical, so timers is the > remaining one which might be questionable. This can be fine-tuned later > IMO e.g. by defining a softirq mask of critical softirqs to block and > eventually splitting up e.g. timer and tasklet softirqs into critical > and uncritical ones. Well, okay, but please add an abridged version of this to the patch description then. > Scrubbing will probably pull the cache lines of the dirty pages into > the L1 cache of the cpu. For me this sounds problematic. In case we > are fine to do scrubbing as there is no risk associated I'm fine to add > it back in. Well, of course there's going to be a brief period of time where a cache line will be present in CPU internal buffers (it's not just the cache after all, as we've learned with XSA-297). So I can certainly buy that when using core granularity you don't want to scrub on the other thread. But what about the socket granularity case? Scrubbing on fully idle cores should still be fine, I would think. >>> As with core scheduling we can be sure the other thread is active >>> (otherwise we would schedule the idle item) and hoping for saving power >>> by using mwait is moot. >> >> Saving power may be indirect, by the CPU re-arranging >> resource assignment between threads when one goes idle. >> I have no idea whether they do this when entering C1, or >> only when entering deeper C states. > > SDM Vol. 3 chapter 8.10.1 "HLT instruction": > > "Here shared resources that were being used by the halted logical > processor become available to active logical processors, allowing them > to execute at greater efficiency." To be honest, this is to broad/generic a statement to fully trust it, judging from other areas of the SDM. And then, as per above, what about the socket granularity case? Putting entirely idle cores to sleep is surely worthwhile? >> And anyway - I'm still none the wiser as to the v->is_urgent >> relationship. > > With v->is_urgent set today's idle loop will drop into default_idle(). > I can remove this sentence in case it is just confusing. I'd prefer if the connection would become more obvious. One needs to go from ->is_urgent via ->urgent_count to sched_has_urgent_vcpu() to find where the described behavior really lives. What's worse though: This won't work as intended on AMD at all. I don't think it's correct to fall back to default_idle() in this case. Instead sched_has_urgent_vcpu() returning true should amount to the same effect as max_cstate being set to 1. There's (a) no reason not to use MWAIT on Intel CPUs in this case, if MWAIT can enter C1, and (b) a strong need to use MWAIT on (at least) AMD Fam17, or else it won't be C1 that gets entered. I'll see about making a patch in due course. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.