[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [RFC PATCH v1 04/16] xen: sched: make the logic for tracking idle core generic.



Introduced in 9bb9c73884d "xen: credit2: implement true
SMT support", it was available only to Credit2.

Move the functions to common headers so that also other
schedulers can use them, to track the idleness of full
cores.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx>
---
Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 xen/common/sched_credit2.c |  128 ++++++++++++++++++--------------------------
 xen/include/xen/sched.h    |   35 ++++++++++++
 2 files changed, 87 insertions(+), 76 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 743848121f..5d2040ff90 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -217,6 +217,44 @@
  *    must always be taken for first.
  */
 
+/*
+ * Hyperthreading (SMT) support.
+ *
+ * We use a special per-runq mask (smt_idle) and update it according to the
+ * following logic:
+ *  - when _all_ the SMT sibling in a core are idle, all their corresponding
+ *    bits are set in the smt_idle mask;
+ *  - when even _just_one_ of the SMT siblings in a core is not idle, all the
+ *    bits correspondings to it and to all its siblings are clear in the
+ *    smt_idle mask.
+ *
+ * Once we have such a mask, it is easy to implement a policy that, either:
+ *  - uses fully idle cores first: it is enough to try to schedule the vcpus
+ *    on pcpus from smt_idle mask first. This is what happens if
+ *    sched_smt_power_savings was not set at boot (default), and it maximizes
+ *    true parallelism, and hence performance;
+ *  - uses already busy cores first: it is enough to try to schedule the vcpus
+ *    on pcpus that are idle, but are not in smt_idle. This is what happens if
+ *    sched_smt_power_savings is set at boot, and it allows as more cores as
+ *    possible to stay in low power states, minimizing power consumption.
+ *
+ * This logic is entirely implemented in runq_tickle(), and that is enough.
+ * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
+ * runq, _always_ happens by means of tickling:
+ *  - when a vcpu wakes up, it calls csched2_vcpu_wake(), which calls
+ *    runq_tickle();
+ *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
+ *    csched2_vcpu_migrate() (which calls migrate()) and csched2_vcpu_wake().
+ *    csched2_cpu_pick() looks for the least loaded runq and return just any
+ *    of its processors. Then, csched2_vcpu_migrate() just moves the vcpu to
+ *    the chosen runq, and it is again runq_tickle(), called by
+ *    csched2_vcpu_wake() that actually decides what pcpu to use within the
+ *    chosen runq;
+ *  - when a migration is initiated in sched_credit2.c, by calling  migrate()
+ *    directly, that again temporarily use a random pcpu from the new runq,
+ *    and then calls runq_tickle(), by itself.
+ */
+
 /*
  * Basic constants
  */
@@ -489,6 +527,20 @@ struct csched2_runqueue_data {
     unsigned int pick_bias;    /* Last picked pcpu. Start from it next time  */
 };
 
+/*
+ * Note that rqd->smt_idle is different than rqd->idle. rqd->idle
+ * records pcpus that at are merely idle (i.e., at the moment do not
+ * have a vcpu running on them).  But you have to manually filter out
+ * which pcpus have been tickled in order to find cores that are not
+ * going to be busy soon.  Filtering out tickled cpus pairwise is a
+ * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
+ * the bits of a pcpu are set only if all the threads on its core are
+ * both idle *and* untickled.
+ *
+ * This means changing the mask when either rqd->idle or rqd->tickled
+ * changes.
+ */
+
 /*
  * System-wide private data
  */
@@ -600,82 +652,6 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
     return svc->budget != STIME_MAX;
 }
 
-/*
- * Hyperthreading (SMT) support.
- *
- * We use a special per-runq mask (smt_idle) and update it according to the
- * following logic:
- *  - when _all_ the SMT sibling in a core are idle, all their corresponding
- *    bits are set in the smt_idle mask;
- *  - when even _just_one_ of the SMT siblings in a core is not idle, all the
- *    bits correspondings to it and to all its siblings are clear in the
- *    smt_idle mask.
- *
- * Once we have such a mask, it is easy to implement a policy that, either:
- *  - uses fully idle cores first: it is enough to try to schedule the vcpus
- *    on pcpus from smt_idle mask first. This is what happens if
- *    sched_smt_power_savings was not set at boot (default), and it maximizes
- *    true parallelism, and hence performance;
- *  - uses already busy cores first: it is enough to try to schedule the vcpus
- *    on pcpus that are idle, but are not in smt_idle. This is what happens if
- *    sched_smt_power_savings is set at boot, and it allows as more cores as
- *    possible to stay in low power states, minimizing power consumption.
- *
- * This logic is entirely implemented in runq_tickle(), and that is enough.
- * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a
- * runq, _always_ happens by means of tickling:
- *  - when a vcpu wakes up, it calls csched2_vcpu_wake(), which calls
- *    runq_tickle();
- *  - when a migration is initiated in schedule.c, we call csched2_cpu_pick(),
- *    csched2_vcpu_migrate() (which calls migrate()) and csched2_vcpu_wake().
- *    csched2_cpu_pick() looks for the least loaded runq and return just any
- *    of its processors. Then, csched2_vcpu_migrate() just moves the vcpu to
- *    the chosen runq, and it is again runq_tickle(), called by
- *    csched2_vcpu_wake() that actually decides what pcpu to use within the
- *    chosen runq;
- *  - when a migration is initiated in sched_credit2.c, by calling  migrate()
- *    directly, that again temporarily use a random pcpu from the new runq,
- *    and then calls runq_tickle(), by itself.
- */
-
-/*
- * If all the siblings of cpu (including cpu itself) are both idle and
- * untickled, set all their bits in mask.
- *
- * NB that rqd->smt_idle is different than rqd->idle.  rqd->idle
- * records pcpus that at are merely idle (i.e., at the moment do not
- * have a vcpu running on them).  But you have to manually filter out
- * which pcpus have been tickled in order to find cores that are not
- * going to be busy soon.  Filtering out tickled cpus pairwise is a
- * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
- * the bits of a pcpu are set only if all the threads on its core are
- * both idle *and* untickled.
- *
- * This means changing the mask when either rqd->idle or rqd->tickled
- * changes.
- */
-static inline
-void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
-                       cpumask_t *mask)
-{
-    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
-
-    if ( cpumask_subset(cpu_siblings, idlers) )
-        cpumask_or(mask, mask, cpu_siblings);
-}
-
-/*
- * Clear the bits of all the siblings of cpu from mask (if necessary).
- */
-static inline
-void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
-{
-    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
-
-    if ( cpumask_subset(cpu_siblings, mask) )
-        cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
-}
-
 /*
  * In csched2_cpu_pick(), it may not be possible to actually look at remote
  * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 51ceebe6cc..09c25bfdd2 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -894,8 +894,43 @@ static inline bool is_vcpu_online(const struct vcpu *v)
     return !test_bit(_VPF_down, &v->pause_flags);
 }
 
+/*
+ *  - sched_smt_power_savings=1, means maximum true parallelism. The schedulers
+ *    should try to schedule vcpus on pcpus belonging to cores on which all
+ *    the threads are currently;
+ *  - sched_dmt_power_savings=0, means minimum power consumption. The 
schedulers
+ *    should try to schedule vcpus on pcpus belonging to cores on which one
+ *    or more threads are currently busy, as this allows for as many cores as
+ *    possible to stay in low power states.
+ */
 extern bool sched_smt_power_savings;
 
+/*
+ * If all the siblings of cpu (including cpu itself) are idle, set
+ * their bits in mask.
+ */
+static inline
+void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
+                       cpumask_t *mask)
+{
+    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+    if ( cpumask_subset(cpu_siblings, idlers) )
+        cpumask_or(mask, mask, cpu_siblings);
+}
+
+/*
+ * Clear the bits of all the siblings of cpu from mask (if necessary).
+ */
+static inline
+void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
+{
+    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+    if ( cpumask_subset(cpu_siblings, mask) )
+        cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+}
+
 extern enum cpufreq_controller {
     FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
 } cpufreq_controller;


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.