[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen:rtds:Fix bug in budget accounting



On Wed, Oct 26, 2016 at 10:43 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Fri, 2016-10-21 at 22:11 -0400, Meng Xu wrote:
>> Bug scenario:
>> While a VCPU is running on a core, it may misses its deadline.
>>
> May be useful to mention why (at least the most common reasons, like
> overhead and/or system being overloaded... are there others?).

How about adding this:
For example, when the sum of the utilizations  of all VCPUs that are
pinned onto a core is larger than 1.

As you know,  the VCPU utilization is not the only factor that
determines the schedulability of gEDF scheduling. It will be hard to
give the exact situation when deadline may happen.

>
>> Then repl_timer_handler() will update the VCPU period and deadline.
>> The VCPU may have high priority with the new deadline and
>> repl_timer_handler()
>> decides to keep the VCPU running on the core, but with new period and
>> deadline.
>>
> repl_timer_handler() does not decide what to run. It's probably better
> to say something like "If the VCPU is still the highest priority one,
> even with the new deadline, it will continue to run"

OK.

>
>> However, the budget enforcement timer for the previous period is
>> still armed.
>> When rt_schedule() is called in the new period to enforce the budget
>> for the previous period, current burn_budget() will deduct the time
>> spent
>> in previous period from the budget in current period, which is
>> incorrect.
>>
> Ok, so, at time t=10 the replenishment timer triggers. It finds the
> vcpu in the following status:
>
>  v->cur_deadline = 10
>  v->cur_budget = 3
>  v->last_start = 2

The idea you got is correct.
However, the values here won't happen. If the last_start is 2 and
cur_budget is 3, the budget enforcement should be no later than t = 5.

How about using the following parameters:
v->cur_deadline = 10
v->cur_budget = 4
v->last_start = 8

rt_schedule() will be invoked at t = 12 for budget enforcement.

>
> I.e., the vcpu still has some budget left, but, e.g., because the
> system is overloaded, could not use it in its previous period.
>
> Assuming v->period=10 and v->budget=5, rt_update_deadline(), called
> from within repl_timer_handler(), will put the vcpu in the following
> state:
>
>  v->curr_deadline = 20
>  v->curr_budget = 5
>  v->last_start = 2
>

With the parameters I suggested above, it will be
 v->curr_deadline = 20
 v->curr_budget = 5
 v->last_start = 8

> Then, at t=15 rt_schedule() is invoked,

Wait, why rt_schedule() is invoked at t=15? It should be invoked at t
= 12 as mentioned above.

> it in turns calls burn_budget()
> which does:
>
>  delta = NOW() - v->last_start = 15 - 2 = 13
>  v->cur_budget -= 13
with the new parameters I suggested,
burn_budget() does:
delta = NOW() - v->last_start = 15 - 8 = 7,
which is too much.

>
> Which is too much. Is this what we are talking about?

The high-level idea you got is correct. It is what we are talking about. :-)
But the parameters used in your example will not happen. :-)

>
>> Fix:
>> We keeps last_start always within the current period for a VCPU, so
>> that
>> We only deduct the time spent in the current period from the VCPU
>> budget.
>>
> Well, this is sort of ok, I guess.
>
> In general, I think that cur_deadline, cur_budget and last_start are
> tightly related between each others. Especially cur_budget and
> last_start, considering that the former is updated in a way that
> depends from the value of the latter.
>
> The problem here seems to me to be that, indeed, we don't update
> last_start all the time that we update cur_budget. If, for instance,
> you look at Credit2, start_time is updated inside burn_credits(), while
> in RTDS, last_start is not updated in burn_budget().
>

The burn_budget() is only invoked by rt_schedule(). So we won't update
last_start until rt_schedule() is called.
However, rt_schedule() can be called after repl_timer_handle() is
called for a VCPU, which makes the last_start no longer inside the
current period.


> So (1), one thing that I would do is to set svc->last_start=now; in
> burn_budget().
>
> However, just doing that won't solve the problem, because
> repl_timer_handler() does not call burn_budget(). I've wondered a bit
> whether that is correct or not... I mean, especially in the situation
> we are discussing --when repl_timer_handler() runs "before"
> rt_schedule()-- how do we account for the budget spent from the last
> invocation of burn_budget() and where in time we are now? Well, I'm not
> really sure whether or not this is a problem.

I see. You want to account for the budget spent in the previous period
when deadline happens.

How about in the repl_timer_handler(), we check if the VCPU is current
running. If yes, we will call burn_budget() for the VCPU and update
the svc->last_start = now.

I don't really like updating the last_start in burn_budget().

last_start is the recent starting time of the VCPU. It is to record
the starting time of a currently running VCPU. So it should be updated
only when the VCPU is scheduled.
When burn_budget() is called, the VCPU can be scheduled or
de-scheduled. When the VCPU is de-scheduled, we don't have to update
the last_start, since it will be updated in the next time when the
VCPU is scheduled.

If we update last_start in burn_budget(), it won't cause incorrect
functionality. It just wastes time on one assignment operation.

>
> Since we don't allow the budget to go negative, and since we are giving
> both new deadline and new budget to the vcpu anyway, it may not be a
> big deal, actually.

OK

> Or maybe it could be an issue, but only if either
> the overhead and/or the overload are so big (e.g., if the vcpu is
> overrunning and doing that for more than one full period), that the
> system is out of the window already.

I'm sorry that I didn't quite get this part "doing that for more than
one full period".

Another thing:
rt_schedule() can be called after repl_timer_handler() for a VCPU
whose budget = period.
When this happens, the VCPU will not get any budget for a period.
Since people usually want low-latency for such VCPU and configure the
period to be large value to avoid scheduling overhead, it will cause a
long delay (equal to period) to the VCPU.

>
> So, let's not consider this, for the moment... And let's focus on the
> fact that, in rt_update_deadline(), we are changing cur_deadline and
> cur_budget, so we also need to update last_start (as we said above
> they're related!). I guess we either call burn_budget from
> rt_update_deadline(), or we open-code svc->last_start=now;

Yep. If we call burn_budget in rt_update_deadline, we will need to
update the last_start in burn_budget.

>
> burn_budget() does things that we probably don't need and don't want to
> do in rt_update_deadline(). In fact, if we don't allow the budget to go
> negative, and ignore the (potential, and maybe non-existent) accounting
> issue described above, there's no point in marking the vcpu as depleted
> (in burn_budget()) and then (since we're calling it from
> rt_update_deadline()) replenishing it right away! :-O
>
> So (2), I guess the best thing to do is "just" to update last_start in
> rt_update_deadline() to, which is sort of what the patch is doing.

Yes. This is the simplest and most neat way I can think of so far.
As you suggested, we can also call burn_budget in rt_update_deadline()
and update last_start in burn_budget.
Both solutions are doing the same thing. However, calling burn_budget
in rt_update_deadline() is more expensive than just updating the
last_start in rt_update_deadline(). :-)

>
> Yet, there's something I don't understand, i.e.:
>
>> Signed-off-by: Meng Xu <mengxu@xxxxxxxxxxxxx>
>> Reported-by: Dagaen Golomb <dgolomb@xxxxxxxxxxxxx>
>>
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -407,6 +407,14 @@ rt_update_deadline(s_time_t now, struct rt_vcpu
>> *svc)
>>          svc->cur_deadline += count * svc->period;
>>      }
>>
>> +    /*
>> +     * svc may be scheduled to run immediately after it misses
>> deadline
>> +     * Then rt_update_deadline is called before rt_schedule, which
>> +     * should only deduct the time spent in current period from the
>> budget
>> +     */
>> +    if ( svc->last_start < (svc->cur_deadline - svc->period) )
>> +        svc->last_start = svc->cur_deadline - svc->period;
>> +
>  - Do we need the if()? Isn't it safe to just always update last_start?
>    If it is, as it looks to me to be, I'd prefer it done like that.

Yes. I will ditch the if().

>  - Why cur_deadline-period, and not NOW() (well, actually, now)?
>    Well, in the scenario above, and in all the others I can think of,
>    it would be almost the same. And yet, I'd prefer using now, for
>    clarity and consistency.

Hmm, it depends on how we account for the budget.
Suppose the VCPU keeps running.
the start of next period is t = 10 and the next period ends at t = 20.
rt_update_deadline is invoked at t = 10.1.

If last_start = 10, which is cur_deadline - period, we will account
for the 0.1 time unit as the budget consumed by the VCPU.
If last_start = 10.1, which is now, we will not account for the 0.1 time unit.

Since we always account for the time spent in hypervisor into the
budget of currently running VCPU, I chose last_start = cur_deadline -
period.
I'm ok for last_start = now, considering the consistency. So your call. :-)

>
> In summary, what I think we need is a patch that does both (1) and (2),
> i.e., it makes sure that we always update last_start to NOW() all the
> times that we assign to a vcpu its new budget. That would make the code
> more consistent and should also fix the buggy behavior you're seeing.
>
> What do you think?

(1) I agree.
(2) I'm ok with either way.
So I will change the code and test it.
If it solves the problem, which it should, I will send out a path that
has both (1) and (2) today.

Thank you very much!

Meng
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.