[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler



On Tue, Oct 27, 2015 at 11:41 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Tue, 2015-10-27 at 14:32 -0600, suokun wrote:
>> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
>> <dario.faggioli@xxxxxxxxxx> wrote:
>
>> Hi, Dario,
>> Thank you for your reply.
>>
> Hi,
>
>> Here are my two VMs running on the same physical CPU.
>> VM-IO: 1-vCPU pinned to a pCPU, running netperf
>> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
>> Another machine run the netperf client to send the requests to VM-IO.
>>
>> My code is very simple:
>> in VM-IO, as server side: $ netserver -p 12345
>> in VM-CPU, just running a while(1) loop: $./loop
>> in the client, send I/O request to the VM-IO: $ netperf -H
>> [server_ip]
>> -l 15 -t TCP_STREAM -p 12345
>>
> Ok, thanks.
>
>> The setting that led to the poor IO performance is as follows:
>> VM-IO:  1-vCPU pinned to a pCPU, running netperf
>> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
>>
>> The root cause is that when an IO request comes, VM-IOâs vCPU is
>> elevated to BOOST and goes through vcpu_wake â> __runq_tickle. In
>> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM
>> -CPU) is marked as _VPF_migrating.
>>
> Ok.
>
>> Then, Xen goes through schedule() to
>> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
>> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating
>> flag, the descheduled vCPU will be migrated in context_saved() and
>> later woken up in cpu_wake().
>>
> Sure.
>
>> Indeed, csched_vcpu_wake() will quit if the
>> vCPU from VM-CPU is on run queue. But it is actually not. In
>> csched_schedule(), the vCPU will not be inserted back to run queue
>> because it is not runnable due to the __VPF_migrating bit in
>> pause_flags. As such, the vCPU from VM-CPU will boosted and not be
>> preempted by a later IO request because BOOST can not preempt BOOST.
>>
> Aha! Now I see what you mean. From the previous email, I couldn't
> really tell which one call to schedule you where looking at, during
> each phase of the analysis... Thanks for clarifying!
>
> And, yes, I agree with you that, since the vCPU of VM-CPU fails the
> vcpu_runnable() test, it's being treated as it is really waking up from
> sleep, in csched_vcpu_wake(), and hence boosted.
>
>> A simple fix would be allowing BOOST to preempt BOOST.
>>
> Nah, that would be an hack on top of an hack! :-P
>
>> A better fix
>> would be checking the CPU affinity before setting the __VPF_migrating
>> flag.
>>
> Yeah, I like this better. So, can you try the patch attached to this
> email?
>
> Here at my place, without any patch, I get the following results:
>
>  idle:       throughput = 806.64
>  with noise: throughput = 166.50
>
> With the patch, I get this:
>
>  idle:       throughput = 807.18
>  with noise: throughput = 731.66
>
> The patch (if you confirm that it works) fixes the bug in this
> particular situations, where vCPUs are all pinned to the same pCPUs,
> but does not prevent vCPUs being migrated around the pCPUs to become
> BOOSTed in Credit2.
>
> That is something I think we should avoid, and I've got a (small) patch
> series ready for that. I'll give some more testing to it before sending
> it to the list, though, as I want to make sure it's not causing
> regressions.
>
> Thanks and Regards,
> Dario

Hi, Dario,

thank you for your reply.

Here is my patch, actually just one line of code:

if ( new_idlers_empty && new->pri > cur->pri )
{
    SCHED_STAT_CRANK(tickle_idlers_none);
    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
    SCHED_STAT_CRANK(migrate_kicked_away);

+   /* migration can happen only cpu number greater than 1 and vcpu is
not pinned to a single physical CPU */
+   if(num_online_cpus() > 1 &&
cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) {
        set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+   }
    cpumask_set_cpu(cpu, &mask);
}

without patch:
idle: throughput = 941
with noise: throughput = 32

with patch
idle: throughput = 941
with noise: throughput = 691


I tried your patch, here is the test results on my machine:
with your patch
idle: throughput = 941
with noise: throughput = 658

Both our patch can improve the I/O throughput with noise
significantly. But still, compared to the I/O-only scenario, there is
a 250~290 gap.

That is due to the ratelimit in Xen's credit scheduler. The default
value of rate limit is 1000us which means once CPU-intensive vCPU
starts to run, I/O-intensive vCPU need to wait 1000us even though an
I/O-request comes or its priority is BOOST. However, the time interval
between two I/O requests in Netperf is just tens of microsecond, far
less than the ratelimit. That will make some I/O-request cannot be
handled in time, cause the loss of throughput.

I tried to reduce the rate limit manually and the throughput will
increase after that.

Best
Tony


> ---
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.