[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5][RFC]xen: sched: convert RTDS from time to event driven model

On Thu, 2016-02-25 at 12:29 -0500, Tianyang Chen wrote:
> On 2/25/2016 5:34 AM, Dario Faggioli wrote:
> > > 
> > Which one ASSERT() fails?
> > 
> The replq_insert() fails because it's already on the replenishment
> queue 
> when rt_vcpu_wake() is trying to insert a vcpu again.
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012a003>] sched_rt.c#rt_vcpu_wake+0xf0/0x17f
> (XEN)    [<ffff82d08012be0c>] vcpu_wake+0x213/0x3d4
> (XEN)    [<ffff82d08012c327>] vcpu_unblock+0x4b/0x4d
> (XEN)    [<ffff82d080169cea>] vcpu_kick+0x20/0x6f
> (XEN)    [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f
> (XEN)    [<ffff82d08010762a>]
> event_2l.c#evtchn_2l_set_pending+0xa9/0xb9
> (XEN)    [<ffff82d08010822a>] evtchn_send+0x158/0x183
> (XEN)    [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d
> (XEN)    [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:527
> (XEN) ****************************************
Ok, makes sense.

> > > And like you
> > > said, mostly spurious sleep
> > > happens when a vcpu is running and it could happen in other
> > > cases,
> > > although rare.
> > > 
> > I think I said already there's no such thing as "spurious sleep".
> > Or at
> > least, I can't think of anything that I would define a spurious
> > sleep,
> > if you do, please, explain what situation you're referring to.
> > 
> I meant to say spurious wakeup... 
If, for spurious wakeup, we refer to wakeups happening when the vcpu is
either running or runnable (and hence in the runqueue already), which I
do, we don't even get to call __replq_insert() in those cases. I mean,
we leave rt_vcpu_wake() before that, don't we?

> If rt_vcpu_sleep() removes vcpus from 
> replenishment queue, it's perfectly safe for rt_vcpu_wake() to
> insert 
> them back. 
It's safe against sleeps, not against blockings. That's the point I'm
trying to make.

> So, I'm suspecting it's the spurious wakeup that's causing 
> troubles because vcpus are not removed prior to rt_vcpu_wake().
> However, 
> the two RETURNs at the beginning of rt_vcpu_wake() should catch that 
> shouldn't it?

> > In any case, one way of dealing with vcpus blocking/offlining/etc
> > could
> > be to, in context_saved(), in case we are not adding the vcpu back
> > to
> > the runq, cancel its replenishment event with __replq_remove().
> > 
> > (This may make it possible to avoid doing it in rt_vcpu_sleep()
> > too,
> > but you'll need to check and test.)
> > 
> > Can you give this a try.
> That makes sense. Doing it in context_saved() kinda implies that if
> vcpu is sleeping and taken off, its replenishment event should be 
> removed. On the other hand, the logic is the same as removing it in 
> rt_vcpu_sleep() but just at different times.
It is, but, if done properly, it catches more cases, or at least that's
what I'm after.

> Well, I have tried it and 
> the check still needs to be there in rt_vcpu_wake(). I will send the 
> next version so it's easier to look at.
If you're still seeing assertion failures, perhaps context_saved() is
not the right place where to do that... I'll think more about this...

Anyway, yes, let's see the code. Please, also send again, as a follow
up email, the assertion failure log you get if you remove the check.

Thanks and Regards,
<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.