Xen project Mailing List

Re: [Xen-devel] [PATCH 06/24] xen: credit2: implement yield()

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Tue, 13 Sep 2016 14:33:10 +0100

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anshul Makkar <anshul.makkar@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Tue, 13 Sep 2016 13:34:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 17/08/16 18:18, Dario Faggioli wrote: > When a vcpu explicitly yields it is usually giving > us an advice of "let someone else run and come back > to me in a bit." > > Credit2 isn't, so far, doing anything when a vcpu > yields, which means an yield is basically a NOP (well, > actually, it's pure overhead, as it causes the scheduler > kick in, but the result is --at least 99% of the time-- > that the very same vcpu that yielded continues to run). > > Implement a "preempt bias", to be applied to yielding > vcpus. Basically when evaluating what vcpu to run next, > if a vcpu that has just yielded is encountered, we give > it a credit penalty, and check whether there is anyone > else that would better take over the cpu (of course, > if there isn't the yielding vcpu will continue). > > The value of this bias can be configured with a boot > time parameter, and the default is set to 1 ms. > > Also, add an yield performance counter, and fix the > style of a couple of comments. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Cool! A few comments... > --- > Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > Cc: Anshul Makkar <anshul.makkar@xxxxxxxxxx> > Cc: Jan Beulich <jbeulich@xxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > Note that this *only* consider the bias during the very scheduling decision > that retults from the vcpu calling yield. After that, the __CSFLAG_vcpu_yield > flag is reset, and during all furute scheduling decisions, the vcpu will > compete with the other ones with its own amount of credits. > > Alternatively, we can actually _subtract_ some credits to a yielding vcpu. > That will sort of make the effect of a call to yield last in time. But normally we want the yield to be temporary, right? The kinds of places it typically gets called is when the vcpu is waiting for a spinlock held by another (probably pre-empted) vcpu. Doing a permanent credit subtraction will bias the credit algorithm against cpus that have a high amount of spinlock contention (since probably all the vcpus will be calling yield pretty regularly) > I'm not sure which path is best. Personally, I like the subtract approach > (perhaps, with a smaller bias than 1ms), but I think the "one shot" behavior > implemented here is a good starting point. It is _something_, which is better > than nothing, which is what we have without this patch! :-) It's lightweight > (in its impact on the crediting algorithm, I mean), and benchmarks looks nice, > so I propose we go for this one, and explore the "permanent" --subtraction > based-- solution a bit more. Yes, this is simple and should be effective for now. We can look at improving it later. > --- > docs/misc/xen-command-line.markdown | 10 ++++++ > xen/common/sched_credit2.c | 62 > +++++++++++++++++++++++++++++++---- > xen/common/schedule.c | 2 + > xen/include/xen/perfc_defn.h | 1 + > 4 files changed, 68 insertions(+), 7 deletions(-) > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index 3a250cb..5f469b1 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -1389,6 +1389,16 @@ Choose the default scheduler. > ### sched\_credit2\_migrate\_resist > > `= <integer>` > > +### sched\_credit2\_yield\_bias > +> `= <integer>` > + > +> Default: `1000` > + > +Set how much a yielding vcpu will be penalized, in order to actually > +give a chance to run to some other vcpu. This is basically a bias, in > +favour of the non-yielding vcpus, expressed in microseconds (default > +is 1ms). Probably add _us to the end to indicate that the number is in microseconds. > @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data *rqd, > struct list_head *iter; > struct csched2_vcpu *snext = NULL; > struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu)); > + int yield_bias = 0; > > /* Default to current if runnable, idle otherwise */ > if ( vcpu_runnable(scurr->vcpu) ) > + { > + /* > + * The way we actually take yields into account is like this: > + * if scurr is yielding, when comparing its credits with other > + * vcpus in the runqueue, act like those other vcpus had yield_bias > + * more credits. > + */ > + if ( unlikely(scurr->flags & CSFLAG_vcpu_yield) ) > + yield_bias = CSCHED2_YIELD_BIAS; > + > snext = scurr; > + } > else > snext = CSCHED2_VCPU(idle_vcpu[cpu]); > > @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, > list_for_each( iter, &rqd->runq ) > { > struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, > runq_elem); > + int svc_credit = svc->credit + yield_bias; Just curious, why did you decide to add yield_bias to everyone else, rather than just subtracting it from snext->credit? > > /* Only consider vcpus that are allowed to run on this processor. */ > if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) ) > @@ -2288,19 +2321,23 @@ runq_candidate(struct csched2_runqueue_data *rqd, > continue; > } > > - /* If this is on a different processor, don't pull it unless > - * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ > + /* > + * If this is on a different processor, don't pull it unless > + * its credit is at least CSCHED2_MIGRATE_RESIST higher. > + */ > if ( svc->vcpu->processor != cpu > - && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) > + && snext->credit + CSCHED2_MIGRATE_RESIST > svc_credit ) > { > (*pos)++; > SCHED_STAT_CRANK(migrate_resisted); > continue; > } > > - /* If the next one on the list has more credit than current > - * (or idle, if current is not runnable), choose it. */ > - if ( svc->credit > snext->credit ) > + /* > + * If the next one on the list has more credit than current > + * (or idle, if current is not runnable), choose it. > + */ > + if ( svc_credit > snext->credit ) > snext = svc; > > /* In any case, if we got this far, break. */ > @@ -2399,6 +2436,8 @@ csched2_schedule( > && vcpu_runnable(current) ) > __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags); > > + __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags); > + > ret.migrated = 0; > > /* Accounting for non-idle tasks */ > @@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops) > printk(XENLOG_INFO "load tracking window lenght %llu ns\n", > 1ULL << opt_load_window_shift); > > + if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN ) > + { > + printk("WARNING: %s: opt_yield_bias %d too small, resetting\n", > + __func__, opt_yield_bias); > + opt_yield_bias = 1000; /* 1 ms */ > + } Why do we need a minimum bias? And why reset it to 1ms rather than SCHED2_YIELD_BIAS_MIN? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.