Xen project Mailing List

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Wed, 28 Oct 2015 11:04:30 -0600

Cc: jgross@xxxxxxxx, George Dunlap <dunlapg@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Wed, 28 Oct 2015 17:05:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Oct 27, 2015 at 11:41 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > On Tue, 2015-10-27 at 14:32 -0600, suokun wrote: >> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli >> <dario.faggioli@xxxxxxxxxx> wrote: > >> Hi, Dario, >> Thank you for your reply. >> > Hi, > >> Here are my two VMs running on the same physical CPU. >> VM-IO: 1-vCPU pinned to a pCPU, running netperf >> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop >> Another machine run the netperf client to send the requests to VM-IO. >> >> My code is very simple: >> in VM-IO, as server side: $ netserver -p 12345 >> in VM-CPU, just running a while(1) loop: $./loop >> in the client, send I/O request to the VM-IO: $ netperf -H >> [server_ip] >> -l 15 -t TCP_STREAM -p 12345 >> > Ok, thanks. > >> The setting that led to the poor IO performance is as follows: >> VM-IO: 1-vCPU pinned to a pCPU, running netperf >> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop >> >> The root cause is that when an IO request comes, VM-IOâs vCPU is >> elevated to BOOST and goes through vcpu_wake â> __runq_tickle. In >> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM >> -CPU) is marked as _VPF_migrating. >> > Ok. > >> Then, Xen goes through schedule() to >> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the >> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating >> flag, the descheduled vCPU will be migrated in context_saved() and >> later woken up in cpu_wake(). >> > Sure. > >> Indeed, csched_vcpu_wake() will quit if the >> vCPU from VM-CPU is on run queue. But it is actually not. In >> csched_schedule(), the vCPU will not be inserted back to run queue >> because it is not runnable due to the __VPF_migrating bit in >> pause_flags. As such, the vCPU from VM-CPU will boosted and not be >> preempted by a later IO request because BOOST can not preempt BOOST. >> > Aha! Now I see what you mean. From the previous email, I couldn't > really tell which one call to schedule you where looking at, during > each phase of the analysis... Thanks for clarifying! > > And, yes, I agree with you that, since the vCPU of VM-CPU fails the > vcpu_runnable() test, it's being treated as it is really waking up from > sleep, in csched_vcpu_wake(), and hence boosted. > >> A simple fix would be allowing BOOST to preempt BOOST. >> > Nah, that would be an hack on top of an hack! :-P > >> A better fix >> would be checking the CPU affinity before setting the __VPF_migrating >> flag. >> > Yeah, I like this better. So, can you try the patch attached to this > email? > > Here at my place, without any patch, I get the following results: > > idle: throughput = 806.64 > with noise: throughput = 166.50 > > With the patch, I get this: > > idle: throughput = 807.18 > with noise: throughput = 731.66 > > The patch (if you confirm that it works) fixes the bug in this > particular situations, where vCPUs are all pinned to the same pCPUs, > but does not prevent vCPUs being migrated around the pCPUs to become > BOOSTed in Credit2. > > That is something I think we should avoid, and I've got a (small) patch > series ready for that. I'll give some more testing to it before sending > it to the list, though, as I want to make sure it's not causing > regressions. > > Thanks and Regards, > Dario Hi, Dario, thank you for your reply. Here is my patch, actually just one line of code: if ( new_idlers_empty && new->pri > cur->pri ) { SCHED_STAT_CRANK(tickle_idlers_none); SCHED_VCPU_STAT_CRANK(cur, kicked_away); SCHED_VCPU_STAT_CRANK(cur, migrate_r); SCHED_STAT_CRANK(migrate_kicked_away); + /* migration can happen only cpu number greater than 1 and vcpu is not pinned to a single physical CPU */ + if(num_online_cpus() > 1 && cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) { set_bit(_VPF_migrating, &cur->vcpu->pause_flags); + } cpumask_set_cpu(cpu, &mask); } without patch: idle: throughput = 941 with noise: throughput = 32 with patch idle: throughput = 941 with noise: throughput = 691 I tried your patch, here is the test results on my machine: with your patch idle: throughput = 941 with noise: throughput = 658 Both our patch can improve the I/O throughput with noise significantly. But still, compared to the I/O-only scenario, there is a 250~290 gap. That is due to the ratelimit in Xen's credit scheduler. The default value of rate limit is 1000us which means once CPU-intensive vCPU starts to run, I/O-intensive vCPU need to wait 1000us even though an I/O-request comes or its priority is BOOST. However, the time interval between two I/O requests in Netperf is just tens of microsecond, far less than the ratelimit. That will make some I/O-request cannot be handled in time, cause the loss of throughput. I tried to reduce the rate limit manually and the throughput will increase after that. Best Tony > --- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.