[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

Hi Dario,

On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:
On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
The patch optimized the sorted credit2 runq from simple linked list
rb-tree implementation. This way we will gain performance and
scalability when the number of vCPUs are huge.

Signed-off-by: Praveen Kumar <kpraveen.lkml@xxxxxxxxx>
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -600,6 +601,29 @@ static inline bool has_cap(const struct
csched2_vcpu *svc)
      return svc->budget != STIME_MAX;
+static void runq_insert_rb(struct rb_root *root,
+                           struct csched2_vcpu *svc,
+                           int *pos)

I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
it's certainly a minor thing).
Sure, would prefer rb_runq_insert()
+    struct csched2_vcpu *entry = NULL;
+    struct rb_node **node = &root->rb_node;
+    struct rb_node *parent = NULL;
+    while (*node) {
+        parent = *node;
+        entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
+        // Check if we are maintaining the sorted

Pointless comment. I'd leave the line blank (but that's taste again).
Will remove this.
+        if ( svc->credit < entry->credit )
+            node = &parent->rb_left;
+        else
+            node = &parent->rb_right;
+        (*pos)++;
+    }
+    rb_link_node(&svc->runq_elem, parent, node);
+    rb_insert_color(&svc->runq_elem, root);

Wait, where's the part where we cache which element is the one with the
highest credits? (And the same applies to the tree-removal function, of

In fact, we need a field for storing such a cache in the runqueue data
structure as well, and we need to keep it updated.

Linux (recently) added an rb-tree variant that do this internally, see
cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
dl and rq leftmost caching".
I thought of caching the left most node as done in rb_tree, but I thought of taking an easy way to have things working and delaying the Linux rb_tree caching variant to be ported in next patch or so.
So, I'd say that we either import that from there, or do the caching of
leftmost element explicitly where needed. Note that, however, that
Linux's variant caches leftmost, because they always use rb-trees for
structures where the head of the queue is the element with the
_smallest_ key.

In our case here, we want the queue ordered in descending credit order,
so it must be thought carefully whether or not we could use the new
variant (maybe "just" inverting the binary search tree relationship),
or we'd need another one that caches righmost.

I would suggest we do not try to use the rb_*_cached() functions, and
cache rightmost explicitly in runqueue_data.

Ok, that sounds better, I will introduce an entry for rightmost element to be cached in runqueue_data Also, lets port the Linux rb_tree cache variant as well ( probably in future we may use that ).
@@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
int cpu,
       * 2) If there's someone waiting whose credit is positive,
       *    run until your credit ~= his.
-    if ( ! list_empty(runq) )
+    if ( ! RB_EMPTY_ROOT(runq) )
-        struct csched2_vcpu *swait = runq_elem(runq->next);
+        // Find the left most element, which is the most probable
+        struct rb_node *node = rb_first(runq);

Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.
Yes, it looks like an error. Will update the patch in v2.
+        // TODO Can we take rb_next ?
+        //struct rb_node *node = &rb_next(root->rb_node);
What do you mean here?

This won't matter if we do caching.
+        struct csched2_vcpu *swait = runq_elem(node);
          if ( ! is_idle_vcpu(swait->vcpu)
               && swait->credit > 0 )
              rt_credit = snext->credit - swait->credit;
@@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
          snext = csched2_vcpu(idle_vcpu[cpu]);
-    list_for_each_safe( iter, temp, &rqd->runq )
-    {
-        struct csched2_vcpu * svc = list_entry(iter, struct
csched2_vcpu, runq_elem);
+    for (iter = rb_first(&rqd->runq); iter != NULL; iter =
rb_next(iter)) {
+        struct csched2_vcpu * svc = rb_entry(iter, struct
csched2_vcpu, runq_elem);

Same as above. I don't think this is from where we want to start. And
no matter whether we want to start from rb_first() or rb_last(), we
certainly don't want to have to traverse the tree to get to there.

@@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
              dump_pcpu(ops, j);
-        list_for_each( iter, runq )
-        {
+        for (iter = rb_first(runq); iter != NULL; iter =
rb_next(iter)) {

And the same applies here as well. I think that, if we want the
runqueue printed in the proper order, we need to start from the
righmost, and use rb_prev().

Oh, and about caching, I'd say it is okay if you want to turn this into
a series, the first patch of which does not have it, and you introduce
it in the second patch.

Sure, let me have 3 series, first; Linux porting , second; rb_tree changes which doesn't have caching and third; have rightmost node cached.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.