[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Recent upgrade of 4.13 -> 4.14 issue
On 26.10.20 17:31, Dario Faggioli wrote: On Mon, 2020-10-26 at 15:30 +0100, Jürgen Groß wrote:On 26.10.20 14:54, Andrew Cooper wrote:On 26/10/2020 13:37, Frédéric Pierret wrote:If anyone would have any idea of what's going on, that would be very appreciated. Thank you.Does booting Xen with `sched=credit` make a difference?Hmm, I think I have spotted a problem in credit2 which could explain the hang: csched2_unit_wake() will NOT put the sched unit on a runqueue in case it has CSFLAG_scheduled set. This bit will be reset only in csched2_context_saved().Exactly, it does not put it back there. However, if it finds a vCPU with the CSFLAG_scheduled flag set, It should set CSFLAG_delayed_runq_add flag. Unless curr_on_cpu(cpu)==unit or unit_on_runq(svc)==true... which should not be the case. Or where you saying that we actually are in one of this situations? In fact...So in case a vcpu (and its unit, of course) is blocked and there has been no other vcpu active on its physical cpu but the idle vcpu, there will be no call of csched2_context_saved(). This will block the vcpu to become active in theory for eternity, in case there is no need to run another vcpu on the physical cpu....I maybe am not seeing what exact situation and sequence of events you're exactly thinking to. What I see is this: [*] - vCPU V is running, i.e., CSFLAG_scheduled is set - vCPU V blocks - we enter schedule() - schedule calls do_schedule() --> csched2_schedule() - we pick idle, so CSFLAG_delayed_runq_add is set for V - schedule calls sched_context_switch() - sched_context_switch() calls context_switch() - context_switch() calls sched_context_switched() - sched_context_switched() calls: - vcpu_context_saved() - unit_context_saved() - unit_context_saved() calls sched_context_saved() --> csched2_context_saved() - csched2_context_saved(): - clears CSFLAG_scheduled - checks (and clear) CSFLAG_delayed_runq_add [*] this assumes granularity 1, i.e., no core-scheduling and no rendezvous. Or was core-scheduling actually enabled? And if CSFLAG_delayed_runq_add is set **and** the vCPU is runnable, the task is added back to the runqueue. So, even if we don't do the actual context switch (i.e., we don't call __context_switch() ) if the next vCPU that we pick when vCPU V blocks is the idle one, it looks to me that we go get to call csched2_context_saved(). And it also looks to me that, when we get to that, if the vCPU is runnable, even if it has the CSFLAG_scheduled still set, we do put it back to the runqueue. And if the vCPU blocked, but csched2_unit_wake() run while CSFLAG_scheduled was still set, it indeed should mean that the vCPU itself will be runnable again when we get to csched2_context_saved(). Or did you have something completely different in mind, and I'm missing it? No, I think you are right. I mixed that up with __context_switch() not being called. Sorry for the noise, Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |