[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 145796: tolerable FAIL - PUSHED
On Sun, 2020-02-02 at 12:57 +0000, Julien Grall wrote: > Hi Dario, > Hi, > Apologies for the late answer. > No problem, I also did not had any more time to look into this yet. > On 22/01/2020 03:40, Dario Faggioli wrote: > > On Fri, 2020-01-10 at 18:24 +0000, Julien Grall wrote: > > > > > You have a 2 vCPUs dom0, and how many other vCPUs from other > > domains? > > Or do you only have those 2 dom0 vCPUs and you are actually pausing > > dom0? > > Only dom0 with 2 vCPUs is running. On every hypercall, it will try > to > pause/unpause itself. > Ok, that was my understanding, but I wasn't 100% sure. Thanks for confirming. > This is to roughly match the behavior of the Arm > guest atomic helpers. > Yep, makes sense. > > If you just have the 2 dom0's vCPUs around (and we call them vCPU A > > and > > vCPU B), the only case for which I can imagine runq_pick() > > returning A > > on CPU1 would be if CPU0 would be running vCPU B (and invoked the > > hypercall from it) and CPU1 was idle... is this the case? > > This is indeed the case. The schedule() on CPU1 has happenned > because > vCPU A was woken up (e.g an interrupt was received and injected to > the > vCPU). > Right. > > In fact, I'm starting to think that patch 7c7b407e777 "xen/sched: > > introduce unit_runnable_state()", which added the 'q_remove(snext)' > > in > > rt_schedule() might not be correct. > > I have tested Xen before this commit and didn't manage to reproduce > the > crash. As soon as I had the commit, it will crash quite quickly. > Ok, thanks for checking this as well. That's very useful. > > In fact, if runq_pick() returns a vCPU which is in the runqueue, > > but is > > not runnable (e.g., because we're racing with do_domain_pause(), > > which > > already set pause_count), it's not rt_schedule() job to dequeue it > > from > > anything. > > > > We probably should just ignore it and pick another vCPU, if any > > (and > > idle otherwise). Then, after we release the lock, if will be > > rt_unit_sleep(), called by do_domain_pause() in this case, that > > will > > finish the job of properly dequeueing it... > > > > Another strange thing is that, as the code looks right now, > > runq_pick() > > returns the first unit in the runq (i.e., the one with the earliest > > deadline), without checking whether it is runnable. Then, in > > rt_schedule(), if the unit is not runnable, we (only partially, as > > you > > figured out) dequeue it, and use idle instead, as our candidate for > > being the next scheduled unit... But what if there were other > > *runnable* units in the runqueue? > > My knowledge of the scheduler is quite limited. Maybe Meng would be > able > to answer to this question? > Yes, indeed, here I was pretty much thinking out loud, and trying to trigger comments from Meng. Anyway, I'll see about putting together a quick test patch that implement what I described (next week), and let's see if it works. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |