Xen project Mailing List

Re: [Xen-devel] [PATCH] xen:rtds:fix bug in accounting budget

On Fri, Oct 21, 2016 at 1:36 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > > On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote: > > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > > from quantum-driven model to event-driven model. > > We assumed rt_schedule() is always called for a VCPU > > before the VCPUs budget replenished handler. > > > No, we didn't. > > Or at least, I've never done so, and tried as hard as I could to tell > you guys to not make any assumptions on who run first. OK... I never made such assumption either. To be precise, I guess I should rephrase the words as "The current event-driven RTDS scheduler does not correctly handle the corner cases when r t_schedule() is called for a VCPU after the VCPU's budget replenishment handler. " > > > So, yes, I agree that, if there is code that only works under such > assumption, it's a bug. > > > This assumption does not hold, when system is overloaded, or > > when the VCPU budget is almost equal its period. > > > > Buggy behavior: > > 1) A VCPU may get less budget that assigned in a period. > > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to > > budget, > > may not get any budget in some period. > > > So, there are two bugs. And things are very subtle, as far as I can > judge from both the bugs description and the code. > > It would be, therefore, a lot more clear if you could send _one_ patch > per bug. OK. Will do that soon. > > > > Bug analysis: > > 1) A VCPU deadline can be fast-forwarded by more than one period. > > However, the VCPU last_start time was not updated immediately. > > If rt_schedule() is called after rt_update_deadline(), which > > happens > > when VCPU budget is equal to period or when VCPU has deadline > > miss, > > burn_budget() will burn the budget that was just replenished, > > although the replenished budget should be used in the most recent > > period only. > > > -EPARSE. > > I've looked at the code and try to match current behavior, your > proposed change, and this description, but failed. > > Can you be a little more precise and specific about what happens when? Sure! Let me give an example in the new patch. > > I'll keep looking and thinking, but any help in making all this a bit > more clear would be very welcome. I'm sending a new patch in one or two hours. > > > > 2) When a full capacity VCPU depletes its budget and is context > > switching out, > > but has not updated the cores current running VCPU, > > > "has not updated the cores current running VCPU," > > I've not idea what this sentence means. What is it that has not yet > been updated? The current running VCPU is per_cpu(schedule_data, c).curr . The key point here is that rt_schedule() decides to deschedule a VCPU whose budget is just depleted, and the repl_timer_handler is called before the VCPU finishes the context switch. Until when rt_contexted_saved() is called, we won't update the per_cpu(schedule_data, c).curr. So the repl_timer_handler() still thought the VCPU is running, while the VCPU is in the progress of being scheduled out. Then repl_timer_handler() fails to reschedule the VCPU since the VCPU's budget is replenished. > > > > the budget replenish timer may be triggerred. > > The replenish handler failed to re-schedule the full capacity VCPU > > because it thought the VCPU is running. > > > > When a VCPU budget is replenished, we try to tickle a CPU. > > When we find a core for a VCPU to tickle and the VCPU is context > > switching out, > > we will always tickle the core where the VCPU was running, > > if the VCPU cannot find another core to tickle > > > Can't understand much again... I guess this is the description of the > solution to the bug? Yes. This is the solution. Hmm, I will describe the steps of how the bugs happens in the next version and clearly mark which part is the solution idea. > > > > This bug was reported by Dagaen Golomb > > > You can give credit by using the following tag: > > Reported by: Dagaen Golomb <xxx@xxxxxx> Sure! Will do. Thanks and best regards, Meng > > > Thanks and Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) -- ------------ Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.