Xen project Mailing List

Re: [Xen-devel] [PATCH 1/3] xen: sched: introduce the 'null' semi-static scheduler

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Mon, 20 Mar 2017 16:21:58 -0700 (PDT)

Cc: Jonathan Davies <Jonathan.Davies@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Marcus Granado <marcus.granado@xxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 20 Mar 2017 23:22:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, 17 Mar 2017, Dario Faggioli wrote: > In cases where one is absolutely sure that there will be > less vCPUs than pCPUs, having to pay the cose, mostly in > terms of overhead, of an advanced scheduler may be not > desirable. > > The simple scheduler implemented here could be a solution. > Here how it works: > - each vCPU is statically assigned to a pCPU; > - if there are pCPUs without any vCPU assigned, they > stay idle (as in, the run their idle vCPU); > - if there are vCPUs which are not assigned to any > pCPU (e.g., because there are more vCPUs than pCPUs) > they *don't* run, until they get assigned; > - if a vCPU assigned to a pCPU goes away, one of the > waiting to be assigned vCPU, if any, gets assigned > to the pCPU and can run there. > > This scheduler, therefore, if used in configurations > where every vCPUs can be assigned to a pCPU, guarantees > low overhead, low latency, and consistent performance. > > If used as default scheduler, at Xen boot, it is > recommended to limit the number of Dom0 vCPUs (e.g., with > 'dom0_max_vcpus=x'). Otherwise, all the pCPUs will have > one Dom0's vCPU assigned, and there won't be room for > running efficiently (if at all) any guest. > > Target use cases are embedded and HPC, but it may well > be interesting also in circumnstances. > > Kconfig and documentation are update accordingly. > > While there, also document the availability of sched=rtds > as boot parameter, which apparently had been forgotten. > > Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> > --- > Cc: George Dunlap <george.dunlap@xxxxxxxxxx> > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx> > Cc: Julien Grall <julien.grall@xxxxxxx> > Cc: Jonathan Davies <Jonathan.Davies@xxxxxxxxxx> > Cc: Marcus Granado <marcus.granado@xxxxxxxxxx> > --- > docs/misc/xen-command-line.markdown | 2 > xen/common/Kconfig | 11 > xen/common/Makefile | 1 > xen/common/sched_null.c | 816 > +++++++++++++++++++++++++++++++++++ > xen/common/schedule.c | 2 > xen/include/public/domctl.h | 1 > 6 files changed, 832 insertions(+), 1 deletion(-) > create mode 100644 xen/common/sched_null.c > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index 4daf5b5..ad6a5ca 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -1434,7 +1434,7 @@ Map the HPET page as read only in Dom0. If disabled the > page will be mapped > with read and write permissions. > > ### sched > -> `= credit | credit2 | arinc653` > +> `= credit | credit2 | arinc653 | rtds | null` > > > Default: `sched=credit` > > diff --git a/xen/common/Kconfig b/xen/common/Kconfig > index f2ecbc4..518520e 100644 > --- a/xen/common/Kconfig > +++ b/xen/common/Kconfig > @@ -187,6 +187,14 @@ config SCHED_ARINC653 > The ARINC653 scheduler is a hard real-time scheduler for single > cores, targeted for avionics, drones, and medical devices. > > +config SCHED_NULL > + bool "Null scheduler support (EXPERIMENTAL)" > + default y > + ---help--- > + The null scheduler is a static, zero overhead scheduler, > + for when there always are less vCPUs than pCPUs, typically > + in embedded or HPC scenarios. > + > choice > prompt "Default Scheduler?" > default SCHED_CREDIT_DEFAULT > @@ -199,6 +207,8 @@ choice > bool "RT Scheduler" if SCHED_RTDS > config SCHED_ARINC653_DEFAULT > bool "ARINC653 Scheduler" if SCHED_ARINC653 > + config SCHED_NULL_DEFAULT > + bool "Null Scheduler" if SCHED_NULL > endchoice > > config SCHED_DEFAULT > @@ -207,6 +217,7 @@ config SCHED_DEFAULT > default "credit2" if SCHED_CREDIT2_DEFAULT > default "rtds" if SCHED_RTDS_DEFAULT > default "arinc653" if SCHED_ARINC653_DEFAULT > + default "null" if SCHED_NULL_DEFAULT > default "credit" > > endmenu > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 0fed30b..26c5a64 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -40,6 +40,7 @@ obj-$(CONFIG_SCHED_ARINC653) += sched_arinc653.o > obj-$(CONFIG_SCHED_CREDIT) += sched_credit.o > obj-$(CONFIG_SCHED_CREDIT2) += sched_credit2.o > obj-$(CONFIG_SCHED_RTDS) += sched_rt.o > +obj-$(CONFIG_SCHED_NULL) += sched_null.o > obj-y += schedule.o > obj-y += shutdown.o > obj-y += softirq.o > diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c > new file mode 100644 > index 0000000..6a13308 > --- /dev/null > +++ b/xen/common/sched_null.c > @@ -0,0 +1,816 @@ > +/* > + * xen/common/sched_null.c > + * > + * Copyright (c) 2017, Dario Faggioli, Citrix Ltd > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; If not, see > <http://www.gnu.org/licenses/>. > + */ > + > +/* > + * The 'null' scheduler always choose to run, on each pCPU, either nothing > + * (i.e., the pCPU stays idle) or always the same vCPU. > + * > + * It is aimed at supporting static scenarios, where there always are > + * less vCPUs than pCPUs (and the vCPUs don't need to move among pCPUs > + * for any reason) with the least possible overhead. > + * > + * Typical usecase are embedded applications, but also HPC, especially > + * if the scheduler is used inside a cpupool. > + */ > + > +#include <xen/sched.h> > +#include <xen/sched-if.h> > +#include <xen/softirq.h> > +#include <xen/keyhandler.h> > + > + > +/* > + * Locking: > + * - Scheduler-lock (a.k.a. runqueue lock): > + * + is per-pCPU; > + * + serializes assignment and deassignment of vCPUs to a pCPU. > + * - Private data lock (a.k.a. private scheduler lock): > + * + is scheduler-wide; > + * + serializes accesses to the list of domains in this scheduler. > + * - Waitqueue lock: > + * + is scheduler-wide; > + * + serialize accesses to the list of vCPUs waiting to be assigned > + * to pCPUs. > + * > + * Ordering is: private lock, runqueue lock, waitqueue lock. Or, OTOH, > + * waitqueue lock nests inside runqueue lock which nests inside private > + * lock. More specifically: > + * + if we need both runqueue and private locks, we must acquire the > + * private lock for first; > + * + if we need both runqueue and waitqueue locks, we must acquire > + * the runqueue lock for first; > + * + if we need both private and waitqueue locks, we must acquire > + * the private lock for first; > + * + if we already own a runqueue lock, we must never acquire > + * the private lock; > + * + if we already own the waitqueue lock, we must never acquire > + * the runqueue lock or the private lock. > + */ > + > +/* > + * System-wide private data > + */ > +struct null_private { > + spinlock_t lock; /* scheduler lock; nests inside cpupool_lock */ > + struct list_head ndom; /* Domains of this scheduler */ > + struct list_head waitq; /* vCPUs not assigned to any pCPU */ > + spinlock_t waitq_lock; /* serializes waitq; nests inside runq locks */ > + cpumask_t cpus_free; /* CPUs without a vCPU associated to them */ > +}; > + > +/* > + * Physical CPU > + */ > +struct null_pcpu { > + struct vcpu *vcpu; > +}; > +DEFINE_PER_CPU(struct null_pcpu, npc); > + > +/* > + * Virtual CPU > + */ > +struct null_vcpu { > + struct list_head waitq_elem; > + struct null_dom *ndom; This field is redundant, given that from struct vcpu you can get struct domain and from struct domain you can get struct null_dom. I would remove it. > + struct vcpu *vcpu; > + int pcpu; /* To what pCPU the vCPU is assigned (-1 if none) */ Isn't this always the same as struct vcpu->processor? If it's only different when the vcpu is waiting in the waitq, then you can just remove this field and replace the pcpu == -1 test with list_empty(waitq_elem). > +}; > + > +/* > + * Domain > + */ > +struct null_dom { > + struct list_head ndom_elem; > + struct domain *dom; > +}; > + > +/* > + * Accessor helpers functions > + */ > +static inline struct null_private *null_priv(const struct scheduler *ops) > +{ > + return ops->sched_data; > +} > + > +static inline struct null_vcpu *null_vcpu(const struct vcpu *v) > +{ > + return v->sched_priv; > +} > + > +static inline struct null_dom *null_dom(const struct domain *d) > +{ > + return d->sched_priv; > +} > + > +static int null_init(struct scheduler *ops) > +{ > + struct null_private *prv; > + > + printk("Initializing null scheduler\n" > + "WARNING: This is experimental software in development.\n" > + "Use at your own risk.\n"); > + > + prv = xzalloc(struct null_private); > + if ( prv == NULL ) > + return -ENOMEM; > + > + spin_lock_init(&prv->lock); > + spin_lock_init(&prv->waitq_lock); > + INIT_LIST_HEAD(&prv->ndom); > + INIT_LIST_HEAD(&prv->waitq); > + > + ops->sched_data = prv; > + > + return 0; > +} > + > +static void null_deinit(struct scheduler *ops) > +{ > + xfree(ops->sched_data); > + ops->sched_data = NULL; > +} > + > +static void init_pdata(struct null_private *prv, unsigned int cpu) > +{ > + /* Mark the pCPU as free, and with no vCPU assigned */ > + cpumask_set_cpu(cpu, &prv->cpus_free); > + per_cpu(npc, cpu).vcpu = NULL; > +} > + > +static void null_init_pdata(const struct scheduler *ops, void *pdata, int > cpu) > +{ > + struct null_private *prv = null_priv(ops); > + struct schedule_data *sd = &per_cpu(schedule_data, cpu); > + > + /* alloc_pdata is not implemented, so we want this to be NULL. */ > + ASSERT(!pdata); > + > + /* > + * The scheduler lock points already to the default per-cpu spinlock, > + * so there is no remapping to be done. > + */ > + ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock)); > + > + init_pdata(prv, cpu); > +} > + > +static void null_deinit_pdata(const struct scheduler *ops, void *pcpu, int > cpu) > +{ > + struct null_private *prv = null_priv(ops); > + > + /* alloc_pdata not implemented, so this must have stayed NULL */ > + ASSERT(!pcpu); > + > + cpumask_clear_cpu(cpu, &prv->cpus_free); > + per_cpu(npc, cpu).vcpu = NULL; > +} > + > +static void *null_alloc_vdata(const struct scheduler *ops, > + struct vcpu *v, void *dd) > +{ > + struct null_vcpu *nvc; > + > + nvc = xzalloc(struct null_vcpu); > + if ( nvc == NULL ) > + return NULL; > + > + INIT_LIST_HEAD(&nvc->waitq_elem); > + > + /* Not assigned to any pCPU */ > + nvc->pcpu = -1; > + /* Up pointers */ > + nvc->ndom = dd; > + nvc->vcpu = v; > + > + SCHED_STAT_CRANK(vcpu_alloc); > + > + return nvc; > +} > + > +static void null_free_vdata(const struct scheduler *ops, void *priv) > +{ > + struct null_vcpu *nvc = priv; > + > + xfree(nvc); > +} > + > +static void * null_alloc_domdata(const struct scheduler *ops, > + struct domain *d) > +{ > + struct null_private *prv = null_priv(ops); > + struct null_dom *ndom; > + unsigned long flags; > + > + ndom = xzalloc(struct null_dom); > + if ( ndom == NULL ) > + return NULL; > + > + INIT_LIST_HEAD(&ndom->ndom_elem); This is not need given the following list_add_tail > + ndom->dom = d; > + > + spin_lock_irqsave(&prv->lock, flags); > + list_add_tail(&ndom->ndom_elem, &null_priv(ops)->ndom); > + spin_unlock_irqrestore(&prv->lock, flags); > + > + return (void*)ndom; > +} > + > +static void null_free_domdata(const struct scheduler *ops, void *data) > +{ > + unsigned long flags; > + struct null_dom *ndom = data; > + struct null_private *prv = null_priv(ops); > + > + spin_lock_irqsave(&prv->lock, flags); > + list_del_init(&ndom->ndom_elem); > + spin_unlock_irqrestore(&prv->lock, flags); > + > + xfree(data); > +} > + > +static int null_dom_init(const struct scheduler *ops, struct domain *d) > +{ > + struct null_dom *ndom; > + > + if ( is_idle_domain(d) ) > + return 0; > + > + ndom = null_alloc_domdata(ops, d); > + if ( ndom == NULL ) > + return -ENOMEM; > + > + d->sched_priv = ndom; > + > + return 0; > +} > +static void null_dom_destroy(const struct scheduler *ops, struct domain *d) > +{ > + null_free_domdata(ops, null_dom(d)); > +} > + > +/* > + * vCPU to pCPU assignment and placement. This _only_ happens: > + * - on insert, > + * - on migrate. > + * > + * Insert occurs when a vCPU joins this scheduler for the first time > + * (e.g., when the domain it's part of is moved to the scheduler's > + * cpupool). > + * > + * Migration may be necessary if a pCPU (with a vCPU assigned to it) > + * is removed from the scheduler's cpupool. > + * > + * So this is not part of any hot path. > + */ > +static unsigned int pick_cpu(struct null_private *prv, struct vcpu *v) > +{ > + unsigned int cpu = v->processor; > + cpumask_t *cpus = cpupool_domain_cpumask(v->domain); > + > + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); > + > + /* > + * If our processor is free, or we are assigned to it, and it is > + * also still valid, just go for it. > + */ > + if ( likely((per_cpu(npc, cpu).vcpu == NULL || per_cpu(npc, cpu).vcpu == > v) > + && cpumask_test_cpu(cpu, cpus)) ) > + return cpu; > + > + /* If not, just go for a valid free pCPU, if any */ > + cpumask_and(cpumask_scratch_cpu(cpu), &prv->cpus_free, cpus); > + cpu = cpumask_first(cpumask_scratch_cpu(cpu)); > + > + /* > + * If we didn't find any free pCPU, just pick any valid pcpu, even if > + * it has another vCPU assigned. This will happen during shutdown and > + * suspend/resume, but it may also happen during "normal operation", if > + * all the pCPUs are busy. > + * > + * In fact, there must always be something sane in v->processor, or > + * vcpu_schedule_lock() and friends won't work. This is not a problem, > + * as we will actually assign the vCPU to the pCPU we return from here, > + * only if the pCPU is free. > + */ > + if ( unlikely(cpu == nr_cpu_ids) ) > + cpu = cpumask_any(cpus); > + > + return cpu; > +} > + > +static void vcpu_assign(struct null_private *prv, struct vcpu *v, > + unsigned int cpu) > +{ > + ASSERT(null_vcpu(v)->pcpu == -1); > + > + per_cpu(npc, cpu).vcpu = v; > + v->processor = null_vcpu(v)->pcpu = cpu; > + cpumask_clear_cpu(cpu, &prv->cpus_free); > + > + gdprintk(XENLOG_INFO, "%d <-- d%dv%d\n", cpu, v->domain->domain_id, > v->vcpu_id); > +} > + > +static void vcpu_deassign(struct null_private *prv, struct vcpu *v, > + unsigned int cpu) > +{ > + ASSERT(null_vcpu(v)->pcpu == cpu); > + > + null_vcpu(v)->pcpu = -1; > + per_cpu(npc, cpu).vcpu = NULL; > + cpumask_set_cpu(cpu, &prv->cpus_free); > + > + gdprintk(XENLOG_INFO, "%d <-- NULL (d%dv%d)\n", cpu, > v->domain->domain_id, v->vcpu_id); > +} > + > +/* Change the scheduler of cpu to us (null). */ > +static void null_switch_sched(struct scheduler *new_ops, unsigned int cpu, > + void *pdata, void *vdata) > +{ > + struct schedule_data *sd = &per_cpu(schedule_data, cpu); > + struct null_private *prv = null_priv(new_ops); > + struct null_vcpu *nvc = vdata; > + > + ASSERT(nvc && is_idle_vcpu(nvc->vcpu)); > + > + idle_vcpu[cpu]->sched_priv = vdata; > + > + /* > + * We are holding the runqueue lock already (it's been taken in > + * schedule_cpu_switch()). It actually may or may not be the 'right' > + * one for this cpu, but that is ok for preventing races. > + */ > + ASSERT(!local_irq_is_enabled()); > + > + init_pdata(prv, cpu); > + > + per_cpu(scheduler, cpu) = new_ops; > + per_cpu(schedule_data, cpu).sched_priv = pdata; > + > + /* > + * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact, > + * if it is free (and it can be) we want that anyone that manages > + * taking it, finds all the initializations we've done above in place. > + */ > + smp_mb(); > + sd->schedule_lock = &sd->_lock; > +} > + > +static void null_vcpu_insert(const struct scheduler *ops, struct vcpu *v) > +{ > + struct null_private *prv = null_priv(ops); > + struct null_vcpu *nvc = null_vcpu(v); > + unsigned int cpu; > + spinlock_t *lock; > + > + ASSERT(!is_idle_vcpu(v)); > + > + retry: > + lock = vcpu_schedule_lock_irq(v); > + > + cpu = pick_cpu(prv, v); > + > + /* We hold v->processor's runq lock, but we need cpu's one */ > + if ( cpu != v->processor ) > + { > + spin_unlock(lock); > + lock = pcpu_schedule_lock(cpu); > + } > + > + /* > + * If the pCPU is free, we assign v to it. > + * > + * If it is not free (e.g., because we raced with another insert > + * or migrate), but there are free pCPUs, we try to pick again. > + * > + * If the pCPU is not free, and there aren't any (valid) others, > + * we have no alternatives than to go into the waitqueue. > + */ > + if ( likely(per_cpu(npc, cpu).vcpu == NULL) ) > + { > + /* > + * Insert is followed by vcpu_wake(), so there's no need to poke > + * the pcpu with the SCHEDULE_SOFTIRQ, as wake will do that. > + */ > + vcpu_assign(prv, v, cpu); > + } > + else if ( cpumask_intersects(&prv->cpus_free, > + cpupool_domain_cpumask(v->domain)) ) > + { > + spin_unlock(lock); > + goto retry; > + } > + else > + { > + spin_lock(&prv->waitq_lock); > + list_add_tail(&nvc->waitq_elem, &prv->waitq); > + spin_unlock(&prv->waitq_lock); > + } > + spin_unlock_irq(lock); > + > + SCHED_STAT_CRANK(vcpu_insert); > +} > + > +static void null_vcpu_remove(const struct scheduler *ops, struct vcpu *v) > +{ > + struct null_private *prv = null_priv(ops); > + struct null_vcpu *wvc, *nvc = null_vcpu(v); > + unsigned int cpu; > + spinlock_t *lock; > + > + ASSERT(!is_idle_vcpu(v)); > + > + lock = vcpu_schedule_lock_irq(v); > + > + cpu = v->processor; > + > + /* If v is in waitqueue, just get it out of there and bail */ > + if ( unlikely(nvc->pcpu == -1) ) > + { > + spin_lock(&prv->waitq_lock); > + > + ASSERT(!list_empty(&null_vcpu(v)->waitq_elem)); > + list_del_init(&nvc->waitq_elem); > + > + spin_unlock(&prv->waitq_lock); > + > + goto out; > + } > + > + /* > + * If v is assigned to a pCPU, let's see if there is someone waiting. > + * If yes, we assign it to cpu, in spite of v. If no, we just set > + * cpu free. > + */ > + > + ASSERT(per_cpu(npc, cpu).vcpu == v); > + ASSERT(!cpumask_test_cpu(cpu, &prv->cpus_free)); > + > + spin_lock(&prv->waitq_lock); > + wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, > waitq_elem); > + if ( wvc ) > + { > + vcpu_assign(prv, wvc->vcpu, cpu); The vcpu_assign in null_vcpu_insert is protected by the pcpu runq lock, while this call is protected by the waitq_lock lock. Is that safe? > + list_del_init(&wvc->waitq_elem); > + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > + } > + else > + { > + vcpu_deassign(prv, v, cpu); > + } > + spin_unlock(&prv->waitq_lock); > + > + out: > + vcpu_schedule_unlock_irq(lock, v); > + > + SCHED_STAT_CRANK(vcpu_remove); > +} > + > +static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v) > +{ > + ASSERT(!is_idle_vcpu(v)); > + > + if ( unlikely(curr_on_cpu(v->processor) == v) ) > + { > + SCHED_STAT_CRANK(vcpu_wake_running); > + return; > + } > + > + if ( null_vcpu(v)->pcpu == -1 ) > + { > + /* Not exactly "on runq", but close enough for reusing the counter */ > + SCHED_STAT_CRANK(vcpu_wake_onrunq); > + return; coding style > + } > + > + if ( likely(vcpu_runnable(v)) ) > + SCHED_STAT_CRANK(vcpu_wake_runnable); > + else > + SCHED_STAT_CRANK(vcpu_wake_not_runnable); > + > + /* Note that we get here only for vCPUs assigned to a pCPU */ > + cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ); > +} > + > +static void null_vcpu_sleep(const struct scheduler *ops, struct vcpu *v) > +{ > + ASSERT(!is_idle_vcpu(v)); > + > + /* If v is not assigned to a pCPU, or is not running, no need to bother > */ > + if ( curr_on_cpu(v->processor) == v ) > + cpu_raise_softirq(v->processor, SCHEDULE_SOFTIRQ); > + > + SCHED_STAT_CRANK(vcpu_sleep); > +} > + > +static int null_cpu_pick(const struct scheduler *ops, struct vcpu *v) > +{ > + ASSERT(!is_idle_vcpu(v)); > + return pick_cpu(null_priv(ops), v); > +} > + > +static void null_vcpu_migrate(const struct scheduler *ops, struct vcpu *v, > + unsigned int new_cpu) > +{ > + struct null_private *prv = null_priv(ops); > + struct null_vcpu *nvc = null_vcpu(v); > + unsigned int cpu = v->processor; > + > + ASSERT(!is_idle_vcpu(v)); > + > + if ( v->processor == new_cpu ) > + return; > + > + /* > + * v is either in the waitqueue, or assigned to a pCPU. > + * > + * In the former case, there is nothing to do. > + * > + * In the latter, the pCPU to which it was assigned would become free, > + * and we, therefore, should check whether there is anyone in the > + * waitqueue that can be assigned to it. > + */ > + if ( likely(nvc->pcpu != -1) ) > + { > + struct null_vcpu *wvc; > + > + spin_lock(&prv->waitq_lock); > + wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, > waitq_elem); > + if ( wvc && cpumask_test_cpu(cpu, cpupool_domain_cpumask(v->domain)) > ) > + { > + vcpu_assign(prv, wvc->vcpu, cpu); > + list_del_init(&wvc->waitq_elem); > + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > + } > + else > + { > + vcpu_deassign(prv, v, cpu); > + } > + spin_unlock(&prv->waitq_lock); This looks very similar to null_vcpu_remove, maybe you want to refactor the code and call a single shared service function. > + SCHED_STAT_CRANK(migrate_running); coding style > + } > + else > + SCHED_STAT_CRANK(migrate_on_runq); > + > + SCHED_STAT_CRANK(migrated); > + > + /* > + * Let's now consider new_cpu, which is where v is being sent. It can be > + * either free, or have a vCPU already assigned to it. > + * > + * In the former case, we should assign v to it, and try to get it to > run. > + * > + * In latter, all we can do is to park v in the waitqueue. > + */ > + if ( per_cpu(npc, new_cpu).vcpu == NULL ) > + { > + /* We don't know whether v was in the waitqueue. If yes, remove it */ > + spin_lock(&prv->waitq_lock); > + list_del_init(&nvc->waitq_elem); > + spin_unlock(&prv->waitq_lock); > + > + vcpu_assign(prv, v, new_cpu); This vcpu_assign call seems to be unprotected. Should it be within a spin_lock'ed area? > + } > + else > + { > + /* We don't know whether v was in the waitqueue. If no, put it there > */ > + spin_lock(&prv->waitq_lock); > + if ( list_empty(&nvc->waitq_elem) ) > + { > + list_add_tail(&nvc->waitq_elem, &prv->waitq); > + nvc->pcpu = -1; > + } > + else > + ASSERT(nvc->pcpu == -1); > + spin_unlock(&prv->waitq_lock); > + } > + > + /* > + * Whatever all the above, we always at least override v->processor. > + * This is especially important for shutdown or suspend/resume paths, > + * when it is important to let our caller (cpu_disable_scheduler()) > + * know that the migration did happen, to the best of our possibilities, > + * at least. In case of suspend, any temporary inconsistency caused > + * by this, will be fixed-up during resume. > + */ > + v->processor = new_cpu; > +} > + > +#ifndef NDEBUG > +static inline void null_vcpu_check(struct vcpu *v) > +{ > + struct null_vcpu * const nvc = null_vcpu(v); > + struct null_dom * const ndom = nvc->ndom; > + > + BUG_ON(nvc->vcpu != v); > + BUG_ON(ndom != null_dom(v->domain)); > + if ( ndom ) > + { > + BUG_ON(is_idle_vcpu(v)); > + BUG_ON(ndom->dom != v->domain); > + } > + else > + { > + BUG_ON(!is_idle_vcpu(v)); > + } > + SCHED_STAT_CRANK(vcpu_check); > +} > +#define NULL_VCPU_CHECK(v) (null_vcpu_check(v)) > +#else > +#define NULL_VCPU_CHECK(v) > +#endif > + > + > +/* > + * The most simple scheduling function of all times! We either return: > + * - the vCPU assigned to the pCPU, if there's one and it can run; > + * - the idle vCPU, otherwise. > + */ > +static struct task_slice null_schedule(const struct scheduler *ops, > + s_time_t now, > + bool_t tasklet_work_scheduled) > +{ > + const unsigned int cpu = smp_processor_id(); > + struct null_private *prv = null_priv(ops); > + struct null_vcpu *wvc; > + struct task_slice ret; > + > + SCHED_STAT_CRANK(schedule); > + NULL_VCPU_CHECK(current); > + > + ret.task = per_cpu(npc, cpu).vcpu; > + ret.migrated = 0; > + ret.time = -1; > + > + /* > + * We may be new in the cpupool, or just coming back online. In which > + * case, there may be vCPUs in the waitqueue that we can assign to us > + * and run. > + */ > + if ( unlikely(ret.task == NULL) ) > + { > + spin_lock(&prv->waitq_lock); > + wvc = list_first_entry_or_null(&prv->waitq, struct null_vcpu, > waitq_elem); > + if ( wvc ) > + { > + vcpu_assign(prv, wvc->vcpu, cpu); > + list_del_init(&wvc->waitq_elem); > + ret.task = wvc->vcpu; > + } > + spin_unlock(&prv->waitq_lock); > + } > + > + if ( unlikely(tasklet_work_scheduled || > + ret.task == NULL || > + !vcpu_runnable(ret.task)) ) > + ret.task = idle_vcpu[cpu]; > + > + NULL_VCPU_CHECK(ret.task); > + return ret; > +} > + > +static inline void dump_vcpu(struct null_private *prv, struct null_vcpu *nvc) > +{ > + printk("[%i.%i] pcpu=%d", nvc->vcpu->domain->domain_id, > + nvc->vcpu->vcpu_id, nvc->pcpu); > +} > + > +static void null_dump_pcpu(const struct scheduler *ops, int cpu) > +{ > + struct null_private *prv = null_priv(ops); > + struct null_vcpu *nvc; > + spinlock_t *lock; > + unsigned long flags; > +#define cpustr keyhandler_scratch > + > + lock = pcpu_schedule_lock_irqsave(cpu, &flags); > + > + cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, > cpu)); > + printk("CPU[%02d] sibling=%s, ", cpu, cpustr); > + cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu)); > + printk("core=%s", cpustr); > + if ( per_cpu(npc, cpu).vcpu != NULL ) > + printk(", vcpu=d%dv%d", per_cpu(npc, cpu).vcpu->domain->domain_id, > + per_cpu(npc, cpu).vcpu->vcpu_id); > + printk("\n"); > + > + /* current VCPU (nothing to say if that's the idle vcpu) */ > + nvc = null_vcpu(curr_on_cpu(cpu)); > + if ( nvc && !is_idle_vcpu(nvc->vcpu) ) > + { > + printk("\trun: "); > + dump_vcpu(prv, nvc); > + printk("\n"); > + } > + > + pcpu_schedule_unlock_irqrestore(lock, flags, cpu); > +#undef cpustr > +} > + > +static void null_dump(const struct scheduler *ops) > +{ > + struct null_private *prv = null_priv(ops); > + struct list_head *iter; > + unsigned long flags; > + unsigned int loop; > +#define cpustr keyhandler_scratch > + > + spin_lock_irqsave(&prv->lock, flags); > + > + cpulist_scnprintf(cpustr, sizeof(cpustr), &prv->cpus_free); > + printk("\tcpus_free = %s\n", cpustr); > + > + printk("Domain info:\n"); > + loop = 0; > + list_for_each( iter, &prv->ndom ) > + { > + struct null_dom *ndom; > + struct vcpu *v; > + > + ndom = list_entry(iter, struct null_dom, ndom_elem); > + > + printk("\tDomain: %d\n", ndom->dom->domain_id); > + for_each_vcpu( ndom->dom, v ) > + { > + struct null_vcpu * const nvc = null_vcpu(v); > + spinlock_t *lock; > + > + lock = vcpu_schedule_lock(nvc->vcpu); > + > + printk("\t%3d: ", ++loop); > + dump_vcpu(prv, nvc); > + printk("\n"); > + > + vcpu_schedule_unlock(lock, nvc->vcpu); > + } > + } > + > + printk("Waitqueue: "); > + loop = 0; > + spin_lock(&prv->waitq_lock); > + list_for_each( iter, &prv->waitq ) > + { > + struct null_vcpu *nvc = list_entry(iter, struct null_vcpu, > waitq_elem); > + > + if ( loop++ != 0 ) > + printk(", "); > + if ( loop % 24 == 0 ) > + printk("\n\t"); > + printk("d%dv%d", nvc->vcpu->domain->domain_id, nvc->vcpu->vcpu_id); > + } > + printk("\n"); > + spin_unlock(&prv->waitq_lock); > + > + spin_unlock_irqrestore(&prv->lock, flags); > +#undef cpustr > +} > + > +const struct scheduler sched_null_def = { > + .name = "null Scheduler", > + .opt_name = "null", > + .sched_id = XEN_SCHEDULER_NULL, > + .sched_data = NULL, > + > + .init = null_init, > + .deinit = null_deinit, > + .init_pdata = null_init_pdata, > + .switch_sched = null_switch_sched, > + .deinit_pdata = null_deinit_pdata, > + > + .alloc_vdata = null_alloc_vdata, > + .free_vdata = null_free_vdata, > + .alloc_domdata = null_alloc_domdata, > + .free_domdata = null_free_domdata, > + > + .init_domain = null_dom_init, > + .destroy_domain = null_dom_destroy, > + > + .insert_vcpu = null_vcpu_insert, > + .remove_vcpu = null_vcpu_remove, > + > + .wake = null_vcpu_wake, > + .sleep = null_vcpu_sleep, > + .pick_cpu = null_cpu_pick, > + .migrate = null_vcpu_migrate, > + .do_schedule = null_schedule, > + > + .dump_cpu_state = null_dump_pcpu, > + .dump_settings = null_dump, > +}; > + > +REGISTER_SCHEDULER(sched_null_def); > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index 223a120..b482037 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -1785,6 +1785,8 @@ int schedule_cpu_switch(unsigned int cpu, struct > cpupool *c) > > out: > per_cpu(cpupool, cpu) = c; > + /* Trigger a reschedule so the CPU can pick up some work ASAP. */ > + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > > return 0; > } Why? This change is not explained in the commit message. > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h > index 85cbb7c..32b578d 100644 > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -352,6 +352,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t); > #define XEN_SCHEDULER_CREDIT2 6 > #define XEN_SCHEDULER_ARINC653 7 > #define XEN_SCHEDULER_RTDS 8 > +#define XEN_SCHEDULER_NULL 9 > > typedef struct xen_domctl_sched_credit { > uint16_t weight; > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.