Xen project Mailing List

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Thu, 29 Oct 2015 11:25:39 +0100

Cc: jgross@xxxxxxxx, George Dunlap <dunlapg@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 29 Oct 2015 10:26:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, 2015-10-28 at 11:04 -0600, suokun wrote: > Hi, Dario, > Hi, > Here is my patch, actually just one line of code: > Yep, I saw it on the list, only after writing the email when I asked you about it. :-) > if ( new_idlers_empty && new->pri > cur->pri ) > { > SCHED_STAT_CRANK(tickle_idlers_none); > SCHED_VCPU_STAT_CRANK(cur, kicked_away); > SCHED_VCPU_STAT_CRANK(cur, migrate_r); > SCHED_STAT_CRANK(migrate_kicked_away); > > + /* migration can happen only cpu number greater than 1 and vcpu > is > not pinned to a single physical CPU */ > + if(num_online_cpus() > 1 && > cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) { > set_bit(_VPF_migrating, &cur->vcpu->pause_flags); > + } > This is ok, in the specific case under test here. However, while we are here, it also makes sense to check whether migration will actually have any chance of happening. That is influenced by whether there are suitable idle pCPUs in the system (we're doing stuff like that in this everywhere in this function). In fact, even when cur has broader affinity, if none of the pCPUs where it can run are idle, it does not make any sense to attempt the migration (and, in fact, without the other fix I was mentioning in place, that would trigger the spurious boosting behavior that you discovered). Also, given how load balancing works in Credit1, i.e., it takes both hard and soft affinity into account, we need to use the proper mask, depending on what 'balancing step' we are in. That is what my patch is doing. > Both our patch can improve the I/O throughput with noise > significantly. But still, compared to the I/O-only scenario, there is > a 250~290 gap. > > That is due to the ratelimit in Xen's credit scheduler. > Yes, I investigated that myself, and I also traced it to that root cause. > The default > value of rate limit is 1000us which means once CPU-intensive vCPU > starts to run, I/O-intensive vCPU need to wait 1000us even though an > I/O-request comes or its priority is BOOST. However, the time > interval > between two I/O requests in Netperf is just tens of microsecond, far > less than the ratelimit. That will make some I/O-request cannot be > handled in time, cause the loss of throughput. > Indeed. > I tried to reduce the rate limit manually and the throughput will > increase after that. > I saw that too. Thanks again a lot for the report, and for testing the patch. Regards, Dario --- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.