[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler



On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Mon, 2015-10-26 at 23:59 -0600, suokun wrote:
>> Hi all,
>>
> Hi,
>
> And first of all, thanks for resending in plain text, this is much
> appreciated.
>
> Thanks also for the report. I'm not sure I can figure out completely
> what you're saying that you are seeing happening, let's see if you can
> help me... :-)
>

Hi, Dario,
Thank you for your reply.

>> (1) Problem description
>> --------------------------------
>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU
>> and
>> they are pinned to the same physical CPU. An I/O-intensive
>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but
>> obtains
>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>> and latency are very terrible.
>>
> I see. And I take it that you have a test case that makes it easy to
> trigger this behavior. Feel free to post the code to make that happen
> here, I'll be glad to have a look myself.
>

Here are my two VMs running on the same physical CPU.
VM-IO: 1-vCPU pinned to a pCPU, running netperf
VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
Another machine run the netperf client to send the requests to VM-IO.

My code is very simple:
in VM-IO, as server side: $ netserver -p 12345
in VM-CPU, just running a while(1) loop: $./loop
in the client, send I/O request to the VM-IO: $ netperf -H [server_ip]
-l 15 -t TCP_STREAM -p 12345

>> (2) Problem analysis
>> --------------------------------
>> This problem is due to the wake mechanism in Xen and CPU-intensive
>> workload will be waked and boosted by mistake.
>>
>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>
>> [...]
>>
>> next time when the schedule happens and the prev is the vCPU of
>> VM-CPU, the context_saved(vcpu) will be executed.
>>
> What do you mean "next time"? if the vcpu of VM-CPU was running, at the
> point that it became 'prev', someone else must have be running. Are you
> seeing something different than this?
>
>> Because the vCPU has
>> been marked as _VPF_migrating and it will then be waked up.
>>
> Yes, but again, of what "next time when the schedule happens" are we
> talking about?
>
> If VM-IO's vcpu is really being boosted by the I/O event (is this the
> case?), then the schedule invocation that follows its wakeup, should
> just run it (and, as you say, make VM-CPU's vcpu become prev).
>
> Then, yes, context_saved() is called, which calls vcpu_migrate(), which
> then calls vcpu_wake()-->csched_vcpu_wake(), on prev == VM-CPU's vcpu.
> _BUT_ that would most likely do just nothing, as VM-CPU's vcpu is on
> the runqueue at this point, and csched_vcpu_wake() has this:
>
>     if ( unlikely(__vcpu_on_runq(svc)) )
>     {
>         SCHED_STAT_CRANK(vcpu_wake_onrunq);
>
>         return;
>
>     }
>
> So, no boosting happens.
>

The setting that led to the poor IO performance is as follows:
VM-IO:  1-vCPU pinned to a pCPU, running netperf
VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop

The root cause is that when an IO request comes, VM-IOâs vCPU is
elevated to BOOST and goes through vcpu_wake â> __runq_tickle. In
__runq_tickle, the currently running vCPU (i.e., the vCPU from VM-CPU)
is marked as _VPF_migrating. Then, Xen goes through schedule() to
reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating flag,
the descheduled vCPU will be migrated in context_saved() and later
woken up in cpu_wake(). Indeed, csched_vcpu_wake() will quit if the
vCPU from VM-CPU is on run queue. But it is actually not. In
csched_schedule(), the vCPU will not be inserted back to run queue
because it is not runnable due to the __VPF_migrating bit in
pause_flags. As such, the vCPU from VM-CPU will boosted and not be
preempted by a later IO request because BOOST can not preempt BOOST.

A simple fix would be allowing BOOST to preempt BOOST. A better fix
would be checking the CPU affinity before setting the __VPF_migrating
flag.


>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>> BOOST state which is designed originally for I/O-intensive vCPU.
>>
> Again, I don't think I see how.
>
>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>> scheduled out. That will harm the I/O performance significantly.
>>
> If the vcpu of VM-IO becomes BOOST, because of an I/O event, it seems
> to me that it should manage to get scheduled immediately.
>
>> (3) Our Test results
>> --------------------------------
>> Hypervisor: Xen 4.6
>> Dom 0 & Dom U: Linux 3.18
>> Client: Linux 3.18
>> Network: 1 Gigabit Ethernet
>>
>> Throughput:
>> Only VM-I/O: 941 Mbps
>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>
>> Latency:
>> Only VM-I/O: 78 usec
>> co-Run VM-I/O and VM-CPU: 109093 usec
>>
> Yeah, that's pretty poor, and I'm not saying we don't have an issue. I
> just don't understand/don't agree with the analysis.
>
>> This bug has been there since Xen 4.2 and still exists in the latest
>> Xen 4.6.
>>
> The code that set the _VPF_migrating bit in __runq_tickle() was not
> there in Xen 4.2. It has been introduced in Xen 4.3. With "since Xen
> 4.2" do you mean 4.2 included or not?
>

Sorry about here. We have tested on Xen 4.3, 4.4, 4.5, 4.6, it has the
same issue.

Thanks.
Tony


> So, apart from the numbers above, what are there other data and hints
> that led you to the analysis?
>
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.