[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6][RFC]xen: sched: convert RTDS from time to event driven model
On 2/26/2016 4:11 AM, Dario Faggioli wrote: It looks like the current code doesn't add a vcpu to the replenishment queue when vcpu_insert() is called. When the scheduler is initialized, all the vcpus are added to the replenishment queue after waking up from sleep. This needs to be changed (add vcpu to replq in vcpu_insert()) to make it consistent in a sense that when rt_vcpu_insert() is called, it needs to have a corresponding replenishment event queued. This way the ASSERT() is for both cases in __runq_insert() to enforce the fact that "when a vcpu is inserted to runq/depletedq, a replenishment event is waiting for it".I am ok with this (calling replq_insert() in rt_vcpu_insert()). Not just doing that unconditionally though, as a vcpu can, e.g., be paused when the insert_vcpu hook is called, and in that case, I don't think we want to enqueue the replenishment event, do we? Yes. static void rt_context_saved(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu *svc = rt_vcpu(vc); spinlock_t *lock = vcpu_schedule_lock_irq(vc); clear_bit(__RTDS_scheduled, &svc->flags); /* not insert idle vcpu to runq */ if ( is_idle_vcpu(vc) ) goto out; if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) && likely(vcpu_runnable(vc)) ) { __runq_insert(ops, svc); runq_tickle(ops, snext); } else __replq_remove(ops, svc); out: vcpu_schedule_unlock_irq(lock, vc); } And, as said above, if you do this, try also removing the __replq_remove() call from rt_vcpu_sleep(), this one should cover for that (and, actually, more than just that!). After all this, check whether you still have the assert in __replq_insert() triggering and let me knowSo after moving the __replq_remove() to rt_context_saved(), the assert in __replq_insert() still fails when dom0 boots up.Well, maybe removing __replq_remove() from rt_vcpu_sleep() is not entirely ok, as if we do that we fail to deal with the case of when (still in rt_vcpu_sleep()), __vcpu_on_q(svc) is true. So, I'd say, do as I said above wrt rt_context_saved(). For rt_vcpu_sleep(), you can try something like this: static void rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); BUG_ON( is_idle_vcpu(vc) ); SCHED_STAT_CRANK(vcpu_sleep); if ( curr_on_cpu(vc->processor) == vc ) cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); else if ( __vcpu_on_q(svc) ) { __q_remove(svc); __replq_remove(svc); <=== *** LOOK HERE *** } else if ( svc->flags & RTDS_delayed_runq_add ) clear_bit(__RTDS_delayed_runq_add, &svc->flags); } In fact, in both the first and the third case, we go will at some point pass through rt_context_switch(), and hit the __replq_remove() that I made you put there. In the case in the middle, as the vcpu was just queued, and for making it go to sleep, it is enough to remove it from the runq (or depletedq, in the case of this scheduler), we won't go through rt_schedule()-->rt_context_saved(), and hence the __replq_remove() won't be called. Sorry for the overlook, can you try this. That being said...(XEN) Assertion '!vcpu_on_replq(svc)' failed at sched_rt.c:524 (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- (XEN) Xen call trace: (XEN) [<ffff82d080128c07>] sched_rt.c#__replq_insert+0x2b/0x64 (XEN) [<ffff82d08012a072>] sched_rt.c#rt_vcpu_wake+0xf2/0x12c (XEN) [<ffff82d08012be2c>] vcpu_wake+0x213/0x3d4 (XEN) [<ffff82d08012c347>] vcpu_unblock+0x4b/0x4d (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f (XEN) [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 (XEN) [<ffff82d08010822a>] evtchn_send+0x158/0x183 (XEN) [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d (XEN) [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c (XEN)... This says that the we call __replq_insert() from rt_vcpu_wake() and find out in there that vcpu_on_replq() is true. However, in v6 code for rt_vcpu_wake(), I can see this: + if( !__vcpu_on_replq(svc) ) + { + __replq_insert(ops, svc); + ASSERT( vcpu_runnable(vc) ); + } which would make me think that, if vcpu_on_replq() is true, we shouldn't be calling __replq_insert() in the first place. So, have you made other changes wrt v6 when trying this? The v6 doesn't have the if statement commented out when I submitted it. But I tried commenting it out, the assertion failed. It fails in V6 with: rt_vcpu_sleep(): removing replenishment event for all cases rt_context_saved(): not removing replenishment events rt_vcpu_wake(): not checking if the event is already queued. or with: rt_vcpu_sleep(): not removing replenishment event at all rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued. Also with:rt_vcpu_sleep(): removing replenishment event if the vcpu is on runq/depletedq rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued.I added debug prints in all these functions and noticed that it could be caused by racing between spurious wakeups and context switching. See the following events for the last modification above: (XEN) cpu1 picked idle (XEN) d0 attempted to change d0v1's CR4 flags 00000620 -> 00040660 (XEN) cpu2 picked idle (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) cpu0 picked vcpu0 (XEN) vcpu1 wakes up nowhere (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu (XEN) cpu1 picked idle *** vcpu1 is waiting to be context switched (XEN) vcpu2 wakes up nowhere (XEN) cpu0 picked vcpu0 (XEN) cpu2 picked vcpu2 (XEN) cpu0 picked vcpu0 (XEN) cpu0 picked vcpu0 (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> 00040660 (XEN) cpu0 picked vcpu0 (XEN) vcpu2 sleeps on cpu (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without sleep? (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b ... (XEN) Xen call trace: (XEN) [<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b (XEN) [<ffff82d08012bf2c>] vcpu_wake+0x213/0x3d4 (XEN) [<ffff82d08012c447>] vcpu_unblock+0x4b/0x4d (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f (XEN) [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 (XEN) [<ffff82d080108312>] send_guest_vcpu_virq+0x9d/0xba (XEN) [<ffff82d080196cbe>] send_timer_event+0xe/0x10 (XEN) [<ffff82d08012a7b5>] schedule.c#vcpu_singleshot_timer_fn+0x9/0xb (XEN) [<ffff82d080131978>] timer.c#execute_timer+0x4e/0x6c (XEN) [<ffff82d080131ab9>] timer.c#timer_softirq_action+0xdd/0x213 (XEN) [<ffff82d08012df32>] softirq.c#__do_softirq+0x82/0x8d (XEN) [<ffff82d08012df8a>] do_softirq+0x13/0x15 (XEN) [<ffff82d080243ad1>] cpufreq.c#process_softirqs+0x21/0x30So, it looks like spurious wakeup for vcpu1 happens before it was completely context switched off a cpu. But rt_vcpu_wake() didn't see it on cpu with curr_on_cpu() so it fell through the first two RETURNs. I guess the replenishment queue check is necessary for this situation? Thanks, Tianyang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |