[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6][RFC]xen: sched: convert RTDS from time to event driven model
On Fri, 2016-02-26 at 13:33 -0500, Chen, Tianyang wrote: > Attached. > Hey, here I am... sorry it took a bit. I've had a look, and I've been able to come up with some code that I at least do not dislike too much! ;-P Have a look at the attached patch (you should apply it on top of the sched_rt.c file that you sent me in attachment). About what was happening... > On 2016-02-26 13:09, Dario Faggioli wrote: > > On Fri, 2016-02-26 at 12:28 -0500, Tianyang Chen wrote: > > > > > > I added debug prints in all these functions and noticed that it > > > could > > > be > > > caused by racing between spurious wakeups and context switching. > > > It's not really spurious wakeup, it's regular wakeup happening immediately after the vcpu blocked, when we still haven't been able to execute rt_context_saved(). It's not really a race, and in fact we have the _RTDS_scheduled and _RTDS_delayed_runq_add flags that are meant to deal with exactly these situations. In fact (I added some more debug printk): > > > (XEN) vcpu1 wakes up nowhere > > > (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu > > > (XEN) cpu1 picked idle *** vcpu1 is waiting to be context > > > switched > > > (XEN) vcpu2 wakes up nowhere > > > (XEN) cpu0 picked vcpu0 > > > (XEN) cpu2 picked vcpu2 > > > (XEN) cpu0 picked vcpu0 > > > (XEN) cpu0 picked vcpu0 > > > (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> > > > 00040660 > > > (XEN) cpu0 picked vcpu0 > > > (XEN) vcpu2 sleeps on cpu > > > (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without > > > sleep? > > > > > > (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 > > > (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]--- > > > - > (XEN) [ 56.113897] vcpu13 wakes up nowhere (XEN) [ 56.113906] cpu4 picked vcpu13 (XEN) [ 56.113962] vcpu13 blocks (XEN) [ 56.113965] cpu4 picked idle (XEN) [ 56.113969] vcpu13 unblocks (XEN) [ 56.113972] vcpu13 wakes up nowhere (XEN) [ 56.113980] vcpu13 woken while still in scheduling tail (XEN) [ 56.113985] vcpu13 context saved and added back to queue (XEN) [ 56.113992] cpu4 picked vcpu13 So, as you can see, at 56.113962 vcpu13 blocks (note, _blocks_ != _sleeps_), and cpu4 goes idle. Ideally, rt_context_saved() would run and remove the replenishment event of vcpu13 from the replenishment queue. But, at 56.113969, vcpu13 wakes up already, before rt_context_saved() had a chance to run (which happens at 56.113985). It is not a spurious wakeup, as the vcpu was actually blocked and is being unblocked. Since rt_context_saved() hasn't run yet, the replenishment event is still in the queue, and hence any ASSERT asking for !__vcpu_on_replq(vcpu13) is doomed to making Xen explode! :-/ That does not happen in the execution trace above, because that was collected with my patch applied, where I either leave the replenishment event alone, if it still valid (i.e., no deadline miss happened during the blocked period) or, if a new one needs to be enqueued, I dequeue the old one first. The end result is not particularly pretty, but I am up for doing even worse, for the sake of keeping things like this: ASSERT( !__vcpu_on_replq(svc) ); at the beginning of replq_insert() (and its appropriate counterpart at the beginning of replq_remove()). In fact, I consider them really really helpful, when reasoning and trying to figure out how the code works... There is nothing that I hate more than an 'enqueue' function for which you don't know if it is ok to call it when the entity being enqueued is in the queue already (and what actually happens if it is). These ASSERTs, greatly help, from that point of view, clearly stating that, _no_, in this case it's absolutely not right to call the enqueue function if the event is in the queue already. :-) Hope I made myself clear. I gave some testing to the attached patch, and it seems to work to me, but I'd appreciate if you could do more of that. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
sched_rt_wakeup_schedtail.patch Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |