[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
> -----Original Message----- > From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx] > Sent: Tuesday, May 24, 2016 10:02 PM > To: Wu, Feng <feng.wu@xxxxxxxxx>; Jan Beulich <JBeulich@xxxxxxxx> > Cc: andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxxxxx; Tian, Kevin > <kevin.tian@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; > keir@xxxxxxx > Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu > blocking list > > On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote: > > > See, for instance, cpu_disable_scheduler() in schedule.c. What we > > > do is > > > go over all the vcpus of all domains of either the system or the > > > cpupool, and force the ones that we found with v->processor set to > > > the > > > pCPU that is going down, to perform migration (system_state will be > > > different than SYS_STATE_suspend, and we hence take the 'else' > > > branch). > > > > > > Considering that the pCPU is no longer part of the relevant > > > bitmask-s > > > during the migration, the vCPUs will figure out that they just > > > can't > > > stay there, and move somewhere else. > > > > Thanks a lot for the elaboration, it is really helpful. > > > NP :-) > > > > Note, however, that this happens for running and runnable vCPUs. > > > > I don't quite understand this, do you mean cpu_disable_scheduler() > > only handle running and runnable vCPUs, I tried to find some hints > > from the code, but I didn't get it. Could you please give some more > > information about this? > > > It goes through all the vcpus of all domains, and does not check or > care whether they are running, runnable or blocked. > > Let's look at this in some more details. So, let's assume that > processor 5 is going away, and that you have the following vcpus > around: > > d0v0 : v->processor = 5, running on cpu 5 > d0v1 : v->processor = 4, running on cpu 4 > d1v0 : v->processor = 5, runnable but not running > d2v3 : v->processor = 5, blocked > > for d0v0, we do: > cpu_disable_scheduler(5) > set_bit(_VPF_migrating, d0v0->pause_flags); > vcpu_sleep_nosync(d0v0); > SCHED_OP(sleep, d0v0); > csched_vcpu_sleep(d0v0) > cpu_raise_softirq(5, SCHEDULE_SOFTIRQ); > vcpu_migrate(d0v0); > if ( v->is_running || ...) // assume v->is_running is true > return > ... > ... <--- scheduling occurs on processor 5 > ... > context_saved(d0v0) > vcpu_migrate(d0v0); > //is_running is 0, so _VPF_migrating gets cleared > vcpu_move_locked(d0v0, new_cpu); > vcpu_wake(d0v0); > SCHED_OP(wake, d0v0); > csched_vcpu_wake(d0v0) > __runq_insert(d0v0); > __runq_tickle(d0v0); > > for d0v1, we do: > cpu_disable_scheduler(5) > if ( d0v1->processor != 5 ) > continue > > for d1v0, we do: > cpu_disable_scheduler(5) > set_bit(_VPF_migrating, d1v0->pause_flags); > vcpu_sleep_nosync(d1v0); > SCHED_OP(sleep, d1v0); > csched_vcpu_sleep(d1v0) > __runq_remove(d1v0); > vcpu_migrate(d1v0); > if ( d1v0->is_running || > !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags) > // false, but clears the _VPF_migrating flag > vcpu_move_locked(d1v0, new_cpu); > vcpu_wake(v); > SCHED_OP(wake, d1v0); > csched_vcpu_wake(d1v0) > __runq_insert(d1v0); > __runq_tickle(d1v0); > > for d2v3, we do: > cpu_disable_scheduler(5) > set_bit(_VPF_migrating, d2v3- > >pause_flags); > vcpu_sleep_nosync(d2v3); > SCHED_OP(sleep, d2v3); > > csched_vcpu_sleep(d2v3) > [1] // Nothing! > > vcpu_migrate(d2v3); > if ( d2v3->is_running || > > !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags) > // > false, but clears the _VPF_migrating flag > [*] vcpu_move_locked(d2v3, > new_cpu); > vcpu_wake(d2v3); > [2] // Nothing! > > > > If a > > > vCPU is blocker, there is nothing to do, and in fact, nothing > > > happens > > > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). > > > > What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake() > > are NOP '? > > > See [1] and [2] above. > > > > For > > > those vCPUs, as soon as they wake up, they'll figure out that their > > > own > > > v->processor is not there any longer, and will move somewhere else. > > > > So basically, when vCPU is blocking, it has no impact to the blocking > > vcpu > > when 'v->processor' is removed. When the vCPU is waken up, it will > > find > > another pCPU to run, since the original 'v->processor' is down and no > > longer in the cpu bitmask, right? > > > Yes, that was my point. > > _However_, as you can see at [*] above, it must be noted that even > those vcpus that blocked while running on a certain processor (5 in the > example), indeed have a chance to have their > v->processor changed to something that is still online (something > different than 5), as a consequence of that processor going away. > > Whether this is useful/enough for you, I can't tell right now, out of > the top of my head. Thank you so much, Dario! These detailed info is really helpful for me!! > > > > > > But this is not an issue in non pCPU hotplug scenario. > > > > > > > > It's probably an issue even if you remove a cpu from a cpupool (and > > > even a more "interesting" one, if you also manage to add it to > > > another > > > pool, in the meantime) isn't it? > > > > Yes, things become more complex in that case .... > > > Well, but can you confirm that we also have an issue there, and test > and report what happens if you move a cpu from pool A to pool B, while > it still has vcpus from a domain that stays in pool A. > > If there's transient suboptimal behavior, well, we probably can live > with that (depending on the specific characteristics of the transitory, > I'd say). If things crash, we certainly want a fix! Thinking this a bit more, seems there is no issue with this case, let's consider the following scenario: cpupool0: cpu0, cpu1 cpu2, cpu3 cpupool1: cpu4, cpu5, cpu6, cpu7 1. vcpu0 is blocking and it is on the blocking list of cpu0 2. cpu0 is removed from cpupool0, no matter it is added to cpupool1 or not 3. As long as the cpu is online (we need to handle it when cpu is offline, which is cpu hotplug case), sometime later, when notification event happens, we will wake up the blocking vcpu and the scheduler will choose the proper pCPU for it to run. I also test this case, I don't find anything unusual. Thanks, Feng > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |