[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap



On 12/06/2017 14:19, Dario Faggioli wrote:
Hey,

Budget is burned by the domain's vCPUs, in a similar way to how
credits are.

When a domain runs out of budget, its vCPUs can't run any longer.

if the vcpus of a domain have credit and if budget has run out, will
the
vcpus won't be scheduled.

Is this a question? Assuming it is, what do you mean with "domain have
credit"? Domains always have credits, and they never run out of them.
There's no such thing as a domain not being runnable because it does
not have credits.

"domain have credit" ? the statement is if the "vcpus of domain have credit".

About budget, a domain with <= 0 budget means all its vcpus are not
runnable, and hence won't be scheduler, no matter their credits.
You answered the question here.

@@ -92,6 +92,82 @@
  */

 /*
+ * Utilization cap:
+ *
+ * Setting an pCPU utilization cap for a domain means the
following:
+ *
+ * - a domain can have a cap, expressed in terms of % of physical
+ * For implementing this, we use the following approach:
+ *
+ * - each domain is given a 'budget', an each domain has a timer,
which
+ *   replenishes the domain's budget periodically. The budget is
the amount
+ *   of time the vCPUs of the domain can use every 'period';
+ *
+ * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for
all domains
+ *   (but each domain has its own timer; so the all are periodic
by the same
+ *   period, but replenishment of the budgets of the various
domains, at
+ *   periods boundaries, are not synchronous);
+ *
+ * - when vCPUs run, they consume budget. When they don't run,
they don't
+ *   consume budget. If there is no budget left for the domain, no
vCPU of
+ *   that domain can run. If a vCPU tries to run and finds that
there is no
+ *   budget, it blocks.
+ *   Budget never expires, so at whatever time a vCPU wants to
run, it can
+ *   check the domain's budget, and if there is some, it can use
it.
+ *
+ * - budget is replenished to the top of the capacity for the
domain once
+ *   per period. Even if there was some leftover budget from
previous period,
+ *   though, the budget after a replenishment will always be at
most equal
+ *   to the total capacify of the domain ('tot_budget');
+ *

budget is replenished but credits not available ?

Still not getting this.
what I want to ask is that if the budget of the domain is replenished, but credit for the vcpus of that domain is not available, then what happens. I believe, vcpus won't be scheduled (even if they have budget_quota) till they get their credit replenished.

budget is finished but not vcpu has not reached the rate limit
boundary ?

Budget takes precedence over ratelimiting. This is important to keep
cap working "regularly", rather then in some kind of permanent "trying-
to-keep-up-with-overruns-in-previous-periods" state.

And, ideally, a vcpu cap and ratelimiting should be set in such a way
that they don't step on each other toe (or do that only rarely). I can
see about trying to print a warning when I detect potential tricky
values (but it's not easy, considering budget is per-domain, so I can't
be sure about how much each vcpu will actually get, and whether or not
why you can't be sure. Scheduler know the domain budget, number of vcpus per domain and we can calculate the budget_quota and translate it into cpu slot duration. Similarly , the value of rate limit is also known. We can compare and give a warning to the user if the budget_quota is less than rate limit.

This is very important for the user to know, if wrongly chosen, it can adversely affect the system's performance with frequent context switches. (the problem we are aware of).

that will reveal to be significantly less than ratelimiting the most of
the times).

+ * - when a budget replenishment occurs, if there are vCPUs that
had been
+ *   blocked because of lack of budget, they'll be unblocked, and
they will
+ *   (potentially) be able to run again.
+ *
+ * Finally, some even more implementation related detail:
+ *
+ * - budget is stored in a domain-wide pool. vCPUs of the domain
that want
+ *   to run go to such pool, and grub some. When they do so, the
amount
+ *   they grabbed is _immediately_ removed from the pool. This
happens in
+ *   vcpu_try_to_get_budget();
+ *
+ * - when vCPUs stop running, if they've not consumed all the
budget they
+ *   took, the leftover is put back in the pool. This happens in
+ *   vcpu_give_budget_back();

200% budget, 4 vcpus to run on 4 pcpus each allowed only 50% of
budget.
This is a static allocation .

Err... again, are you telling or asking?
giving an example to prove its a static allocation.

 for eg. 2 vcpus running on 2 pvpus at 20%
budgeted time, if vcpu3 wants to execute some cpu intensive task,
then
it won't be allowed to allowed to use more than 50% of the pcpus.

With what parameters? You mean with these ones you cite here (i.e., a
200% budget)? If the VM has 200%, and vcpu1 and vcpu2 consumes
20%+20%=40%, there's 160% free for vcpu3 and vcpu4.

I checked the implenation below and I believe we can allow for these
type of dynamic budget_quota allocation per vcpu. Not for initial
version, but certainly we can consider it for future versions.

But... it's already totally dynamic.
csched2_dom_cntl()
{
svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
                                        CSCHED2_MIN_TIMER);
}
If domain->tot_budge = 200
nr_cpus is 4, then each cpu gets 50%.
How this is dynamic allocation ? We are not considering vcpu utilization of other vcpus of domain before allocating budget_quota for some vcpu.

Let me know if my understanding is wrong.

@@ -408,6 +505,10 @@ struct csched2_vcpu {
     unsigned int residual;

     int credit;
+
+    s_time_t budget;

it's confusing, please can we have different member names for budget
in
domain and vcpu structure.

Mmm.. I don't think it is. It's "how much budget this _thing_ have",
where "thing" can be the domain or a vcpu, and you can tell that by
looking at the containing structure. Most of the time, that's rather
explicit, the former being sdom->budget, the latter being svc->budget.

What different names did you have in mind?

The only alternative that I can come up with would be something like:

struct csched2_dom {
 ...
 dom_budget;
 ...
};
struct csched2_vcpu {
 ...
 vcpu_budget;
 ...
};

Which I don't like (because of the repetition).

Just shared a thought as I experienced the confusion while I was reading the code for the first time. If you don't agree, its fine.
@@ -1354,7 +1469,16 @@ static void reset_credit(const struct
scheduler *ops, int cpu, s_time_t now,
          * that the credit it has spent so far get accounted.
          */
         if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+        {
             burn_credits(rqd, svc, now);
+            /*
+             * And, similarly, in case it has run out of budget,
as a
+             * consequence of this round of accounting, we also
must inform
+             * its pCPU that it's time to park it, and pick up
someone else.
+             */
+            if ( unlikely(svc->budget <= 0) )

use of unlikely here is not saving much of cpu cycles.

Well, considering that not all domains will have a cap, and that we
don't expect, even for the domains with caps, all their vcpus to
exhaust their budget at every reset event, I think annotating this as
an unlikely event makes sense.
From what I understand, I considered it to be a very likely event.


It's not a super big deal, though, and I can get rid of it, if people
don't like/are not convinced about it. :-)
yes, its fine, we can leave it for now.


@@ -1410,27 +1534,35 @@ void burn_credits(struct
+    sdom->budget += svc->budget;
+
+    if ( sdom->budget > 0 )
+    {
+        svc->budget = sdom->budget;

why are you assigning the remaining sdom->budge to only this svc.
svc
should be assigned a proportionate budget.
Each vcpu is assigned a %age of the domain budget based on the cap
and
number of vcpus.
There is difference in the code that's here and the code in branch
git://xenbits.xen.org/people/dariof/xen.git (fetch)
rel/sched/credti2-caps branch. Logic in the branch code looks fine
where
you are taking svc->budget_quota into considration..

Yeah... maybe look at patch 3/4. :-P
Yeah, got it in third patch. :)

In runq candidate we have a code base
/*
  * Return the current vcpu if it has executed for less than
ratelimit.
  * Adjuststment for the selected vcpu's credit and decision
  * for how long it will run will be taken in csched2_runtime.
  *
  * Note that, if scurr is yielding, we don't let rate limiting kick
in.
  * In fact, it may be the case that scurr is about to spin, and
there's
  * no point forcing it to do so until rate limiting expires.
  */
  if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
       vcpu_runnable(scurr->vcpu) &&
      (now - scurr->vcpu->runstate.state_entry_time) <
        MICROSECS(prv->ratelimit_us) )
In this codeblock we return scurr. Here there is no check for vcpu-
budget.
Even if the scurr vcpu has executed for less than rate limit and
scurr
is not yielding, we need to check for its budget before returning
scurr.

But we check vcpu_runnable(scurr). And we've already called, in
csched2_schedule(), vcpu_try_to_get_budget(scurr). And if scurr could
not get any budget, we called park_vcpu(scurr), which sets scurr up in
such a way that vcpu_runnable(scurr) is false.
Yes, got your point, but then the call for vcpu_try_to_get_budet should move to the code block in runq_candidate that return scurr other wise the condition looks incomplete and makes the logic ambiguous.

We call runq_candidate to find the next runnable candidate. If we want to return scurr as the current runnable candidate then it should have gone through all the checks including budget_quota and all these checks should be at one place.

Thanks and Regards,
Dario


Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.