[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler



On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@xxxxxxxxx> wrote:
> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@xxxxxxxxx> wrote:
>> Hi all,
>>
>> The BOOST mechanism in Xen credit scheduler is designed to prioritize
>> VM which has I/O-intensive application to handle the I/O request in
>> time. However, this does not always work as expected.
>
> Thanks for the exploration, and the analysis.
>
> The BOOST mechanism is part of the reason I began to write the credit2
> scheduler, which we are  hoping (any day now) to make the default
> scheduler.  It was designed specifically with the workload you mention
> in mind.  Would you care to try your test again and see how it fares?
>

Hi, George,

Thank you for your reply. I have test credit2 this morning. The I/O
performance is correct, however, the CPU accounting seems not correct.
Here is my experiment on credit2:

VM-IO:          1-vCPU pinned to a pCPU, running netperf
VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
The throughput of netperf is the same(941Mbps) as VM-IO runs alone.

However, when I use xl top to show the VM CPU utilization, VM-IO takes
73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
100%. I doubt it is due to the CPU utilization accounting in credit2
scheduler.


> Also, do you have a patch to fix it in credit1? :-)
>

For the patch to my problem in credit1. I have two ideas:

1) if the vCPU cannot migrate(e.g. pinned, CPU affinity, even only has
one physical CPU), do not set the _VPF_migrating flag.

2) let the BOOST state can preempt with each other.

Actually I have tested both separately and they both work. But
personally I prefer the first option because it solved the problem
from the source.

Best
Tony

>  -George
>
>>
>>
>> (1) Problem description
>> --------------------------------
>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
>> they are pinned to the same physical CPU. An I/O-intensive
>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>> and latency are very terrible.
>>
>>
>>
>> (2) Problem analysis
>> --------------------------------
>> This problem is due to the wake mechanism in Xen and CPU-intensive
>> workload will be waked and boosted by mistake.
>>
>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>
>> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>> {
>> ...
>>            if ( new_idlers_empty && new->pri > cur->pri )
>>            {
>>                SCHED_STAT_CRANK(tickle_idlers_none);
>>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>>                SCHED_STAT_CRANK(migrate_kicked_away);
>>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>>                __cpumask_set_cpu(cpu, &mask);
>>            }
>> }
>>
>>
>> next time when the schedule happens and the prev is the vCPU of
>> VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
>> been marked as _VPF_migrating and it will then be waked up.
>>
>> void context_saved(struct vcpu *prev)
>> {
>>     ...
>>
>>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>>         vcpu_migrate(prev);
>> }
>>
>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>> BOOST state which is designed originally for I/O-intensive vCPU. If
>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>> scheduled out. That will harm the I/O performance significantly.
>>
>>
>>
>> (3) Our Test results
>> --------------------------------
>> Hypervisor: Xen 4.6
>> Dom 0 & Dom U: Linux 3.18
>> Client: Linux 3.18
>> Network: 1 Gigabit Ethernet
>>
>> Throughput:
>> Only VM-I/O: 941 Mbps
>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>
>> Latency:
>> Only VM-I/O: 78 usec
>> co-Run VM-I/O and VM-CPU: 109093 usec
>>
>>
>>
>> This bug has been there since Xen 4.2 and still exists in the latest Xen 4.6.
>> Thanks.
>> Reported by Tony Suo and Yong Zhao from UCCS
>>
>> --
>>
>> **********************************
>>> Tony Suo
>>> Email: suokunstar@xxxxxxxxx
>>> University of Colorado at Colorado Springs
>> **********************************
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.