[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5][RFC]xen: sched: convert RTDS from time to event driven model





On 2/25/2016 5:34 AM, Dario Faggioli wrote:
+     * it should be re-inserted back to the replenishment queue.
+     */
+    if ( now >= svc->cur_deadline)
+    {
+        rt_update_deadline(now, svc);
+        __replq_remove(ops, svc);
+    }
+
+    if( !__vcpu_on_replq(svc) )
+        __replq_insert(ops, svc);
+
And here I am again: is it really necessary to check whether svc is
not
in the replenishment queue? It looks to me that it really should
not be
there... but maybe it can, because we remove the event from the
queue
when the vcpu sleeps, but *not* when the vcpu blocks?
Yeah. That is the case where I keep getting assertion failure if
it's
removed.

Which one ASSERT() fails?

The replq_insert() fails because it's already on the replenishment queue when rt_vcpu_wake() is trying to insert a vcpu again.

(XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:527
(XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08012a003>] sched_rt.c#rt_vcpu_wake+0xf0/0x17f
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000001   rbx: ffff83023b522940   rcx: 0000000000000001
(XEN) rdx: 00000031bb1b9980   rsi: ffff82d080342318   rdi: ffff83023b486ca0
(XEN) rbp: ffff8300bfcffd88   rsp: ffff8300bfcffd58   r8:  0000000000000004
(XEN) r9:  00000000deadbeef   r10: ffff82d08025f5c0   r11: 0000000000000206
(XEN) r12: ffff83023b486ca0   r13: ffff8300bfd46000   r14: ffff82d080299b80
(XEN) r15: ffff83023b522d80   cr0: 0000000080050033   cr4: 00000000000406a0
(XEN) cr3: 0000000231c0d000   cr2: ffff880001e80ba8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8300bfcffd58:
(XEN)    ffff8300bfcffd70 ffff8300bfd46000 0000000216110572 ffff83023b522940
(XEN)    ffff82d08032bc00 0000000000000282 ffff8300bfcffdd8 ffff82d08012be0c
(XEN)    ffff83023b4b5000 ffff83023b4f1000 ffff8300bfd47000 ffff8300bfd46000
(XEN)    0000000000000000 ffff83023b4b4280 0000000000014440 0000000000000001
(XEN)    ffff8300bfcffde8 ffff82d08012c327 ffff8300bfcffe08 ffff82d080169cea
(XEN)    ffff83023b4b5000 000000000000000a ffff8300bfcffe18 ffff82d080169d65
(XEN)    ffff8300bfcffe38 ffff82d08010762a ffff83023b4b4280 ffff83023b4b5000
(XEN)    ffff8300bfcffe68 ffff82d08010822a ffff8300bfcffe68 fffffffffffffff2
(XEN)    ffff88022056dcb4 ffff880230c34440 ffff8300bfcffef8 ffff82d0801096fc
(XEN)    ffff8300bfcffef8 ffff8300bfcfff18 ffff8300bfcffef8 ffff82d080240e85
(XEN)    ffff880200000001 0000000000000000 0000000000000246 ffffffff810013aa
(XEN)    000000000000000a ffffffff810013aa 000000000000e030 ffff8300bfd47000
(XEN)    ffff8802206597f0 ffff880230c34440 0000000000014440 0000000000000001
(XEN)    00007cff403000c7 ffff82d0802439e2 ffffffff8100140a 0000000000000020
(XEN)    ffff88022063c7d0 ffff88022063c7d0 0000000000000001 000000000000dca0
(XEN)    ffff88022056dcb8 ffff880230c34440 0000000000000206 0000000000000004
(XEN)    ffff8802230001a0 ffff880220619000 0000000000000020 ffffffff8100140a
(XEN)    0000000000000000 ffff88022056dcb4 0000000000000004 0001010000000000
(XEN)    ffffffff8100140a 000000000000e033 0000000000000206 ffff88022056dc90
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08012a003>] sched_rt.c#rt_vcpu_wake+0xf0/0x17f
(XEN)    [<ffff82d08012be0c>] vcpu_wake+0x213/0x3d4
(XEN)    [<ffff82d08012c327>] vcpu_unblock+0x4b/0x4d
(XEN)    [<ffff82d080169cea>] vcpu_kick+0x20/0x6f
(XEN)    [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f
(XEN)    [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9
(XEN)    [<ffff82d08010822a>] evtchn_send+0x158/0x183
(XEN)    [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d
(XEN)    [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:527
(XEN) ****************************************
I'm thinking when
a vcpu unblocks, it could potentially fall through here.

Well, when unblocking, wake() is certainly called, yes.

And like you
said, mostly spurious sleep
happens when a vcpu is running and it could happen in other cases,
although rare.

I think I said already there's no such thing as "spurious sleep". Or at
least, I can't think of anything that I would define a spurious sleep,
if you do, please, explain what situation you're referring to.

I meant to say spurious wakeup... If rt_vcpu_sleep() removes vcpus from replenishment queue, it's perfectly safe for rt_vcpu_wake() to insert them back. So, I'm suspecting it's the spurious wakeup that's causing troubles because vcpus are not removed prior to rt_vcpu_wake(). However, the two RETURNs at the beginning of rt_vcpu_wake() should catch that shouldn't it?
In any case, one way of dealing with vcpus blocking/offlining/etc could
be to, in context_saved(), in case we are not adding the vcpu back to
the runq, cancel its replenishment event with __replq_remove().

(This may make it possible to avoid doing it in rt_vcpu_sleep() too,
but you'll need to check and test.)

Can you give this a try.
That makes sense. Doing it in context_saved() kinda implies that if a vcpu is sleeping and taken off, its replenishment event should be removed. On the other hand, the logic is the same as removing it in rt_vcpu_sleep() but just at different times. Well, I have tried it and the check still needs to be there in rt_vcpu_wake(). I will send the next version so it's easier to look at.

Thanks,
Tianyang

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.