[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen:rtds:fix bug in accounting budget



On Fri, Oct 21, 2016 at 1:36 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
>
> On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote:
> > The bug is introduced in Xen 4.7 when we converted RTDS scheduler
> > from quantum-driven model to event-driven model.
> > We assumed rt_schedule() is always called for a VCPU
> > before the VCPUs budget replenished handler.
> >
> No, we didn't.
>
> Or at least, I've never done so, and tried as hard as I could to tell
> you guys to not make any assumptions on who run first.


OK... I never made such assumption either. To be precise, I guess I
should rephrase the words as "The current event-driven RTDS scheduler
does not correctly handle the corner cases when
 r
t_schedule()
is called for a VCPU after the VCPU's budget replenishment handler.
 "
>
>
> So, yes, I agree that, if there is code that only works under such
> assumption, it's a bug.
>
> > This assumption does not hold, when system is overloaded, or
> > when the VCPU budget is almost equal its period.
> >
> > Buggy behavior:
> > 1) A VCPU may get less budget that assigned in a period.
> > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to
> > budget,
> >    may not get any budget in some period.
> >
> So, there are two bugs. And things are very subtle, as far as I can
> judge from both the bugs description and the code.
>
> It would be, therefore, a lot more clear if you could send _one_ patch
> per bug.


OK. Will do that soon.


>
>
> > Bug analysis:
> > 1) A VCPU deadline can be fast-forwarded by more than one period.
> >    However, the VCPU last_start time was not updated immediately.
> >    If rt_schedule() is called after rt_update_deadline(), which
> > happens
> >    when VCPU budget is equal to period or when VCPU has deadline
> > miss,
> >    burn_budget() will burn the budget that was just replenished,
> >    although the replenished budget should be used in the most recent
> > period only.
> >
> -EPARSE.
>
> I've looked at the code and try to match current behavior, your
> proposed change, and this description, but failed.
>
> Can you be a little more precise and specific about what happens when?


Sure!

Let me give an example in the new patch.


>
> I'll keep looking and thinking, but any help in making all this a bit
> more clear would be very welcome.


I'm sending a new patch in one or two hours.


>
>
> > 2) When a full capacity VCPU depletes its budget and is context
> > switching out,
> >    but has not updated the cores current running VCPU,
> >
> "has not updated the cores current running VCPU,"
>
> I've not idea what this sentence means. What is it that has not yet
> been updated?


The current running VCPU is per_cpu(schedule_data, c).curr .
The key point here is that rt_schedule() decides to deschedule a VCPU
whose budget is just depleted, and the repl_timer_handler is called
before the VCPU finishes the context switch. Until when
rt_contexted_saved() is called, we won't update the
per_cpu(schedule_data, c).curr. So the repl_timer_handler() still
thought the VCPU is running, while the VCPU is in the progress of
being scheduled out. Then repl_timer_handler() fails to reschedule the
VCPU since the VCPU's budget is replenished.

>
>
> >    the budget replenish timer may be triggerred.
> >    The replenish handler failed to re-schedule the full capacity VCPU
> >    because it thought the VCPU is running.
> >
> >    When a VCPU budget is replenished, we try to tickle a CPU.
> >    When we find a core for a VCPU to tickle and the VCPU is context
> > switching out,
> >    we will always tickle the core where the VCPU was running,
> >    if the VCPU cannot find another core to tickle
> >
> Can't understand much again... I guess this is the description of the
> solution to the bug?


Yes. This is the solution.

Hmm, I will describe the steps of how the bugs happens in the next
version and clearly mark which part is the solution idea.
>
>
> > This bug was reported by Dagaen Golomb
> >
> You can give credit by using the following tag:
>
> Reported by: Dagaen Golomb <xxx@xxxxxx>


Sure!  Will do.

Thanks and best regards,

Meng

>
>
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

-- 
------------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.