[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler
On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > On Mon, 2015-10-26 at 23:59 -0600, suokun wrote: >> Hi all, >> > Hi, > > And first of all, thanks for resending in plain text, this is much > appreciated. > > Thanks also for the report. I'm not sure I can figure out completely > what you're saying that you are seeing happening, let's see if you can > help me... :-) > Hi, Dario, Thank you for your reply. >> (1) Problem description >> -------------------------------- >> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU >> and >> they are pinned to the same physical CPU. An I/O-intensive >> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive >> application(e.g. Loop) runs in the VM-CPU. When a client is sending >> I/O requests to VM-I/O, its vCPU cannot become BOOST state but >> obtains >> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput >> and latency are very terrible. >> > I see. And I take it that you have a test case that makes it easy to > trigger this behavior. Feel free to post the code to make that happen > here, I'll be glad to have a look myself. > Here are my two VMs running on the same physical CPU. VM-IO: 1-vCPU pinned to a pCPU, running netperf VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop Another machine run the netperf client to send the requests to VM-IO. My code is very simple: in VM-IO, as server side: $ netserver -p 12345 in VM-CPU, just running a while(1) loop: $./loop in the client, send I/O request to the VM-IO: $ netperf -H [server_ip] -l 15 -t TCP_STREAM -p 12345 >> (2) Problem analysis >> -------------------------------- >> This problem is due to the wake mechanism in Xen and CPU-intensive >> workload will be waked and boosted by mistake. >> >> Suppose the vCPU of VM-CPU is running and an I/O request comes, the >> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating. >> >> [...] >> >> next time when the schedule happens and the prev is the vCPU of >> VM-CPU, the context_saved(vcpu) will be executed. >> > What do you mean "next time"? if the vcpu of VM-CPU was running, at the > point that it became 'prev', someone else must have be running. Are you > seeing something different than this? > >> Because the vCPU has >> been marked as _VPF_migrating and it will then be waked up. >> > Yes, but again, of what "next time when the schedule happens" are we > talking about? > > If VM-IO's vcpu is really being boosted by the I/O event (is this the > case?), then the schedule invocation that follows its wakeup, should > just run it (and, as you say, make VM-CPU's vcpu become prev). > > Then, yes, context_saved() is called, which calls vcpu_migrate(), which > then calls vcpu_wake()-->csched_vcpu_wake(), on prev == VM-CPU's vcpu. > _BUT_ that would most likely do just nothing, as VM-CPU's vcpu is on > the runqueue at this point, and csched_vcpu_wake() has this: > > if ( unlikely(__vcpu_on_runq(svc)) ) > { > SCHED_STAT_CRANK(vcpu_wake_onrunq); > > return; > > } > > So, no boosting happens. > The setting that led to the poor IO performance is as follows: VM-IO: 1-vCPU pinned to a pCPU, running netperf VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop The root cause is that when an IO request comes, VM-IOâs vCPU is elevated to BOOST and goes through vcpu_wake â> __runq_tickle. In __runq_tickle, the currently running vCPU (i.e., the vCPU from VM-CPU) is marked as _VPF_migrating. Then, Xen goes through schedule() to reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating flag, the descheduled vCPU will be migrated in context_saved() and later woken up in cpu_wake(). Indeed, csched_vcpu_wake() will quit if the vCPU from VM-CPU is on run queue. But it is actually not. In csched_schedule(), the vCPU will not be inserted back to run queue because it is not runnable due to the __VPF_migrating bit in pause_flags. As such, the vCPU from VM-CPU will boosted and not be preempted by a later IO request because BOOST can not preempt BOOST. A simple fix would be allowing BOOST to preempt BOOST. A better fix would be checking the CPU affinity before setting the __VPF_migrating flag. >> Once the state of vCPU of VM-CPU is UNDER, it will be changed into >> BOOST state which is designed originally for I/O-intensive vCPU. >> > Again, I don't think I see how. > >> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot >> get the physical CPU immediately but wait until the vCPU of VM-CPU is >> scheduled out. That will harm the I/O performance significantly. >> > If the vcpu of VM-IO becomes BOOST, because of an I/O event, it seems > to me that it should manage to get scheduled immediately. > >> (3) Our Test results >> -------------------------------- >> Hypervisor: Xen 4.6 >> Dom 0 & Dom U: Linux 3.18 >> Client: Linux 3.18 >> Network: 1 Gigabit Ethernet >> >> Throughput: >> Only VM-I/O: 941 Mbps >> co-Run VM-I/O and VM-CPU: 32 Mbps >> >> Latency: >> Only VM-I/O: 78 usec >> co-Run VM-I/O and VM-CPU: 109093 usec >> > Yeah, that's pretty poor, and I'm not saying we don't have an issue. I > just don't understand/don't agree with the analysis. > >> This bug has been there since Xen 4.2 and still exists in the latest >> Xen 4.6. >> > The code that set the _VPF_migrating bit in __runq_tickle() was not > there in Xen 4.2. It has been introduced in Xen 4.3. With "since Xen > 4.2" do you mean 4.2 included or not? > Sorry about here. We have tested on Xen 4.3, 4.4, 4.5, 4.6, it has the same issue. Thanks. Tony > So, apart from the numbers above, what are there other data and hints > that led you to the analysis? > > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |