Xen project Mailing List

Re: [Xen-devel] [PATCH v9]xen: sched: convert RTDS from time to event driven model

To: Tianyang Chen <tiche@xxxxxxxxxxxxxx>

Date: Tue, 15 Mar 2016 23:11:10 -0400

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx>

Delivery-date: Wed, 16 Mar 2016 03:11:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Mar 14, 2016 at 8:04 PM, Tianyang Chen <tiche@xxxxxxxxxxxxxx> wrote: > Budget replenishment and enforcement are separated by adding > a replenishment timer, which fires at the next most imminent > release time of all runnable vcpus. > > A replenishment queue has been added to keep track of all vcpus that > are runnable. > > The following functions have major changes to manage the replenishment > events and timer. > > repl_handler(): It is a timer handler which is re-programmed > to fire at the nearest vcpu deadline to replenish vcpus. > > rt_schedule(): picks the highest runnable vcpu based on cpu > affinity and ret.time will be passed to schedule(). If an idle > vcpu is picked, -1 is returned to avoid busy-waiting. repl_update() > has been removed. > > rt_vcpu_wake(): when a vcpu wakes up, it tickles instead of > picking one from the run queue. > > rt_context_saved(): when context switching is finished, the > preempted vcpu will be put back into the runq. > > Simplified funtional graph: > > schedule.c SCHEDULE_SOFTIRQ: > rt_schedule(): > [spin_lock] > burn_budget(scurr) > snext = runq_pick() > [spin_unlock] > > sched_rt.c TIMER_SOFTIRQ > replenishment_timer_handler() > [spin_lock] > <for_each_vcpu_on_q(i)> { > replenish(i) > runq_tickle(i) > }> > program_timer() > [spin_lock] > > Signed-off-by: Tianyang Chen <tiche@xxxxxxxxxxxxxx> > Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx> > Signed-off-by: Dagaen Golomb <dgolomb@xxxxxxxxxxxxxx> > --- > changes since v8: > Replaced spin_lock_irqsave with spin_lock_irq. > Bug fix in burn_budget() where budget == 0. > Removed unnecessary tickling in the timer handler when > vcpus have the same deadline. > > changes since v7: > Coding sytle. > Added a flag to indicate that a vcpu was depleted. > Added a local list including updated vcpus in the > timer handler. It is used to decide which vcpu should > tickle. > > changes since v6: > Removed unnecessary vcpu_on_q() checks for runq_insert() > Renamed replenishment queue related functions without > underscores > New replq_reinsert() function for rt_vcpu_wake() > > changes since v5: > Removed unnecessary vcpu_on_replq() checks > deadline_queue_insert() returns a flag to indicate if it's > needed to re-program the timer > Removed unnecessary timer checks > Added helper functions to remove vcpus from queues > coding style > > Changes since v4: > removed unnecessary replenishment queue checks in vcpu_wake() > extended replq_remove() to all cases in vcpu_sleep() > used _deadline_queue_insert() helper function for both queues > _replq_insert() and _replq_remove() program timer internally > > Changes since v3: > removed running queue. > added repl queue to keep track of repl events. > timer is now per scheduler. > timer is init on a valid cpu in a cpupool. > --- > xen/common/sched_rt.c | 437 > ++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 341 insertions(+), 96 deletions(-) > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index bfed2e2..b491915 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > @@ -3,7 +3,9 @@ > * EDF scheduling is a real-time scheduling algorithm used in embedded field. > * > * by Sisu Xi, 2013, Washington University in Saint Louis > - * and Meng Xu, 2014, University of Pennsylvania > + * Meng Xu, 2014-2016, University of Pennsylvania > + * Tianyang Chen, 2016, University of Pennsylvania > + * Dagaen Golomb, 2016, University of Pennsylvania > * > * based on the code of credit Scheduler > */ > @@ -16,6 +18,7 @@ > #include <xen/delay.h> > #include <xen/event.h> > #include <xen/time.h> > +#include <xen/timer.h> > #include <xen/perfc.h> > #include <xen/sched-if.h> > #include <xen/softirq.h> > @@ -87,7 +90,7 @@ > #define RTDS_DEFAULT_BUDGET (MICROSECS(4000)) > > #define UPDATE_LIMIT_SHIFT 10 > -#define MAX_SCHEDULE (MILLISECS(1)) > + > /* > * Flags > */ > @@ -115,6 +118,18 @@ > #define RTDS_delayed_runq_add (1<<__RTDS_delayed_runq_add) > > /* > + * The replenishment timer handler needs to check this bit > + * to know where a replenished vcpu was, when deciding which > + * vcpu should tickle. > + * A replenished vcpu should tickle if it was moved from the > + * depleted queue to the run queue. > + * + Set in burn_budget() if a vcpu has zero budget left. > + * + Cleared and checked in the repenishment handler. It seems you have an extra + here... Need to be removed. My bad, I didn't spot it out in last patch... :-( > + */ > +#define __RTDS_was_depleted 3 > +#define RTDS_was_depleted (1<<__RTDS_was_depleted) > + > +/* > * rt tracing events ("only" 512 available!). Check > * include/public/trace.h for more details. > */ > @@ -142,6 +157,9 @@ static cpumask_var_t *_cpumask_scratch; > */ > static unsigned int nr_rt_ops; > > +/* handler for the replenishment timer */ > +static void repl_handler(void *data); > + > /* > * Systme-wide private data, include global RunQueue/DepletedQ > * Global lock is referenced by schedule_data.schedule_lock from all > @@ -152,7 +170,9 @@ struct rt_private { > struct list_head sdom; /* list of availalbe domains, used for dump > */ > struct list_head runq; /* ordered list of runnable vcpus */ > struct list_head depletedq; /* unordered list of depleted vcpus */ > + struct list_head replq; /* ordered list of vcpus that need > replenishment */ > cpumask_t tickled; /* cpus been tickled */ > + struct timer *repl_timer; /* replenishment timer */ > }; > > /* > @@ -160,6 +180,7 @@ struct rt_private { > */ > struct rt_vcpu { > struct list_head q_elem; /* on the runq/depletedq list */ > + struct list_head replq_elem; /* on the repl event list */ > > /* Up-pointers */ > struct rt_dom *sdom; > @@ -213,8 +234,14 @@ static inline struct list_head *rt_depletedq(const > struct scheduler *ops) > return &rt_priv(ops)->depletedq; > } > > +static inline struct list_head *rt_replq(const struct scheduler *ops) > +{ > + return &rt_priv(ops)->replq; > +} > + > /* > - * Queue helper functions for runq and depletedq > + * Helper functions for manipulating the runqueue, the depleted queue, > + * and the replenishment events queue. > */ > static int > __vcpu_on_q(const struct rt_vcpu *svc) > @@ -228,6 +255,18 @@ __q_elem(struct list_head *elem) > return list_entry(elem, struct rt_vcpu, q_elem); > } > > +static struct rt_vcpu * > +replq_elem(struct list_head *elem) > +{ > + return list_entry(elem, struct rt_vcpu, replq_elem); > +} > + > +static int > +vcpu_on_replq(const struct rt_vcpu *svc) > +{ > + return !list_empty(&svc->replq_elem); > +} > + > /* > * Debug related code, dump vcpu/cpu information > */ > @@ -288,7 +327,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu) > static void > rt_dump(const struct scheduler *ops) > { > - struct list_head *runq, *depletedq, *iter; > + struct list_head *runq, *depletedq, *replq, *iter; > struct rt_private *prv = rt_priv(ops); > struct rt_vcpu *svc; > struct rt_dom *sdom; > @@ -301,6 +340,7 @@ rt_dump(const struct scheduler *ops) > > runq = rt_runq(ops); > depletedq = rt_depletedq(ops); > + replq = rt_replq(ops); > > printk("Global RunQueue info:\n"); > list_for_each( iter, runq ) > @@ -316,6 +356,13 @@ rt_dump(const struct scheduler *ops) > rt_dump_vcpu(ops, svc); > } > > + printk("Global Replenishment Event info:\n"); > + list_for_each ( iter, replq ) > + { > + svc = replq_elem(iter); > + rt_dump_vcpu(ops, svc); > + } > + > printk("Domain info:\n"); > list_for_each( iter, &prv->sdom ) > { > @@ -380,11 +427,78 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc) > return; > } > > +/* > + * A helper function that only removes a vcpu from a queue > + * and it returns 1 if the vcpu was in front of the list. > + */ > +static inline int > +deadline_queue_remove(struct list_head *queue, struct list_head *elem) > +{ > + int pos = 0; > + > + if ( queue->next != elem ) > + pos = 1; > + > + list_del_init(elem); > + return !pos; > +} > + > static inline void > __q_remove(struct rt_vcpu *svc) > { > - if ( __vcpu_on_q(svc) ) > - list_del_init(&svc->q_elem); > + ASSERT( __vcpu_on_q(svc) ); > + list_del_init(&svc->q_elem); > +} > + > +/* > + * Removing a vcpu from the replenishment queue could > + * re-program the timer for the next replenishment event > + * if it was at the front of the list. > + */ > +static inline void > +__replq_remove(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct rt_private *prv = rt_priv(ops); > + struct list_head *replq = rt_replq(ops); > + struct timer* repl_timer = prv->repl_timer; > + > + ASSERT( vcpu_on_replq(svc) ); > + > + if ( deadline_queue_remove(replq, &svc->replq_elem) ) > + { > + /* re-arm the timer for the next replenishment event */ > + if ( !list_empty(replq) ) > + { > + struct rt_vcpu *svc_next = replq_elem(replq->next); > + set_timer(repl_timer, svc_next->cur_deadline); > + } > + else > + stop_timer(repl_timer); > + } > +} > + > +/* > + * An utility function that inserts a vcpu to a > + * queue based on certain order (EDF). The returned > + * value is 1 if a vcpu has been inserted to the > + * front of a list. > + */ > +static int > +deadline_queue_insert(struct rt_vcpu * (*_get_q_elem)(struct list_head > *elem), > + struct rt_vcpu *svc, struct list_head *elem, struct list_head *queue) > +{ > + struct list_head *iter; > + int pos = 0; > + > + list_for_each ( iter, queue ) > + { > + struct rt_vcpu * iter_svc = (*_get_q_elem)(iter); > + if ( svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + pos++; > + } > + list_add_tail(elem, iter); > + return !pos; > } > > /* > @@ -397,27 +511,62 @@ __runq_insert(const struct scheduler *ops, struct > rt_vcpu *svc) > { > struct rt_private *prv = rt_priv(ops); > struct list_head *runq = rt_runq(ops); > - struct list_head *iter; > > ASSERT( spin_is_locked(&prv->lock) ); > - > ASSERT( !__vcpu_on_q(svc) ); > + ASSERT( vcpu_on_replq(svc) ); > > /* add svc to runq if svc still has budget */ > if ( svc->cur_budget > 0 ) > - { > - list_for_each(iter, runq) > - { > - struct rt_vcpu * iter_svc = __q_elem(iter); > - if ( svc->cur_deadline <= iter_svc->cur_deadline ) > - break; > - } > - list_add_tail(&svc->q_elem, iter); > - } > + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq); > else > - { > list_add(&svc->q_elem, &prv->depletedq); > +} > + > +/* > + * Insert svc into the replenishment event list > + * in replenishment time order. > + * vcpus that need to be replished earlier go first. > + * The timer may be re-programmed if svc is inserted > + * at the front of the event list. > + */ > +static void > +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct list_head *replq = rt_replq(ops); > + struct rt_private *prv = rt_priv(ops); > + struct timer *repl_timer = prv->repl_timer; > + > + ASSERT( !vcpu_on_replq(svc) ); > + > + if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, replq) ) > + set_timer(repl_timer, svc->cur_deadline); > +} > + > +/* > + * Removes and re-inserts an event to the replenishment queue. > + */ > +static void > +replq_reinsert(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct list_head *replq = rt_replq(ops); > + struct rt_private *prv = rt_priv(ops); > + struct timer *repl_timer = prv->repl_timer; > + > + ASSERT( vcpu_on_replq(svc) ); > + > + if ( deadline_queue_remove(replq, &svc->replq_elem) ) > + { > + if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, > replq) ) > + set_timer(repl_timer, svc->cur_deadline); > + else > + { > + struct rt_vcpu *svc_next = replq_elem(replq->next); > + set_timer(repl_timer, svc_next->cur_deadline); > + } > } > + else if ( deadline_queue_insert(&replq_elem, svc, &svc->replq_elem, > replq) ) > + set_timer(repl_timer, svc->cur_deadline); > } > > /* > @@ -449,11 +598,18 @@ rt_init(struct scheduler *ops) > INIT_LIST_HEAD(&prv->sdom); > INIT_LIST_HEAD(&prv->runq); > INIT_LIST_HEAD(&prv->depletedq); > + INIT_LIST_HEAD(&prv->replq); > > cpumask_clear(&prv->tickled); > > ops->sched_data = prv; > > + /* > + * The timer initialization will happen later when > + * the first pcpu is added to this pool in alloc_pdata. > + */ > + prv->repl_timer = NULL; > + > return 0; > > no_mem: > @@ -473,6 +629,10 @@ rt_deinit(struct scheduler *ops) > xfree(_cpumask_scratch); > _cpumask_scratch = NULL; > } > + > + kill_timer(prv->repl_timer); > + xfree(prv->repl_timer); > + > ops->sched_data = NULL; > xfree(prv); > } > @@ -494,6 +654,17 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu) > if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) ) > return NULL; > > + if ( prv->repl_timer == NULL ) > + { > + /* Allocate the timer on the first cpu of this pool. */ > + prv->repl_timer = xzalloc(struct timer); > + > + if ( prv->repl_timer == NULL ) > + return NULL; > + > + init_timer(prv->repl_timer, repl_handler, (void *)ops, cpu); > + } > + > /* 1 indicates alloc. succeed in schedule.c */ > return (void *)1; > } > @@ -587,6 +758,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu > *vc, void *dd) > return NULL; > > INIT_LIST_HEAD(&svc->q_elem); > + INIT_LIST_HEAD(&svc->replq_elem); > svc->flags = 0U; > svc->sdom = dd; > svc->vcpu = vc; > @@ -610,7 +782,8 @@ rt_free_vdata(const struct scheduler *ops, void *priv) > } > > /* > - * This function is called in sched_move_domain() in schedule.c > + * It is called in sched_move_domain() and sched_init_vcpu > + * in schedule.c. > * When move a domain to a new cpupool. > * It inserts vcpus of moving domain to the scheduler's RunQ in > * dest. cpupool. > @@ -628,8 +801,13 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu > *vc) > if ( now >= svc->cur_deadline ) > rt_update_deadline(now, svc); > > - if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) && !vc->is_running ) > - __runq_insert(ops, svc); > + if ( !__vcpu_on_q(svc) && vcpu_runnable(vc) ) > + { > + __replq_insert(ops, svc); > + > + if ( !vc->is_running ) > + __runq_insert(ops, svc); > + } > vcpu_schedule_unlock_irq(lock, vc); > > SCHED_STAT_CRANK(vcpu_insert); > @@ -652,6 +830,10 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu > *vc) > lock = vcpu_schedule_lock_irq(vc); > if ( __vcpu_on_q(svc) ) > __q_remove(svc); > + > + if ( vcpu_on_replq(svc) ) > + __replq_remove(ops,svc); > + > vcpu_schedule_unlock_irq(lock, vc); > } > > @@ -706,8 +888,15 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu > *svc, s_time_t now) > > svc->cur_budget -= delta; > > - if ( svc->cur_budget < 0 ) > + if ( svc->cur_budget <= 0 ) > + { > svc->cur_budget = 0; > + /* > + * Set __RTDS_was_depleted so the replenishment > + * handler can let this vcpu tickle if it was refilled. > + */ > + set_bit(__RTDS_was_depleted, &svc->flags); > + } > > /* TRACE */ > { > @@ -784,44 +973,6 @@ __runq_pick(const struct scheduler *ops, const cpumask_t > *mask) > } > > /* > - * Update vcpu's budget and > - * sort runq by insert the modifed vcpu back to runq > - * lock is grabbed before calling this function > - */ > -static void > -__repl_update(const struct scheduler *ops, s_time_t now) > -{ > - struct list_head *runq = rt_runq(ops); > - struct list_head *depletedq = rt_depletedq(ops); > - struct list_head *iter; > - struct list_head *tmp; > - struct rt_vcpu *svc = NULL; > - > - list_for_each_safe(iter, tmp, runq) > - { > - svc = __q_elem(iter); > - if ( now < svc->cur_deadline ) > - break; > - > - rt_update_deadline(now, svc); > - /* reinsert the vcpu if its deadline is updated */ > - __q_remove(svc); > - __runq_insert(ops, svc); > - } > - > - list_for_each_safe(iter, tmp, depletedq) > - { > - svc = __q_elem(iter); > - if ( now >= svc->cur_deadline ) > - { > - rt_update_deadline(now, svc); > - __q_remove(svc); /* remove from depleted queue */ > - __runq_insert(ops, svc); /* add to runq */ > - } > - } > -} > - > -/* > * schedule function for rt scheduler. > * The lock is already grabbed in schedule.c, no need to lock here > */ > @@ -840,8 +991,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, > bool_t tasklet_work_sched > /* burn_budget would return for IDLE VCPU */ > burn_budget(ops, scurr, now); > > - __repl_update(ops, now); > - > if ( tasklet_work_scheduled ) > { > trace_var(TRC_RTDS_SCHED_TASKLET, 1, 0, NULL); > @@ -868,6 +1017,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, > bool_t tasklet_work_sched > set_bit(__RTDS_delayed_runq_add, &scurr->flags); > > snext->last_start = now; > + ret.time = -1; /* if an idle vcpu is picked */ > if ( !is_idle_vcpu(snext->vcpu) ) > { > if ( snext != scurr ) > @@ -880,9 +1030,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, > bool_t tasklet_work_sched > snext->vcpu->processor = cpu; > ret.migrated = 1; > } > + ret.time = snext->budget; /* invoke the scheduler next time */ Ah, this is incorrect, although this is easy to fix. The ret.time is the relative time when the *budget enforcement* timer should be invoked. Since snext is not always full budget, it may have already consumed some budget. It should be ret.time = snext->cur_budget Isn't it? :-) The other parts of the scheduler looks fine to me, as long as the above comments are solved. Hi Tianyang, Great and expedite work! :-D Hi Dario, Could you have another look at this patch and see if it's ok after the above comments are solved? Thank you very much! Meng -- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.