[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler
On Wed, 2015-10-28 at 11:04 -0600, suokun wrote: > Hi, Dario, > Hi, > Here is my patch, actually just one line of code: > Yep, I saw it on the list, only after writing the email when I asked you about it. :-) > if ( new_idlers_empty && new->pri > cur->pri ) > { > SCHED_STAT_CRANK(tickle_idlers_none); > SCHED_VCPU_STAT_CRANK(cur, kicked_away); > SCHED_VCPU_STAT_CRANK(cur, migrate_r); > SCHED_STAT_CRANK(migrate_kicked_away); > > + /* migration can happen only cpu number greater than 1 and vcpu > is > not pinned to a single physical CPU */ > + if(num_online_cpus() > 1 && > cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) { > set_bit(_VPF_migrating, &cur->vcpu->pause_flags); > + } > This is ok, in the specific case under test here. However, while we are here, it also makes sense to check whether migration will actually have any chance of happening. That is influenced by whether there are suitable idle pCPUs in the system (we're doing stuff like that in this everywhere in this function). In fact, even when cur has broader affinity, if none of the pCPUs where it can run are idle, it does not make any sense to attempt the migration (and, in fact, without the other fix I was mentioning in place, that would trigger the spurious boosting behavior that you discovered). Also, given how load balancing works in Credit1, i.e., it takes both hard and soft affinity into account, we need to use the proper mask, depending on what 'balancing step' we are in. That is what my patch is doing. > Both our patch can improve the I/O throughput with noise > significantly. But still, compared to the I/O-only scenario, there is > a 250~290 gap. > > That is due to the ratelimit in Xen's credit scheduler. > Yes, I investigated that myself, and I also traced it to that root cause. > The default > value of rate limit is 1000us which means once CPU-intensive vCPU > starts to run, I/O-intensive vCPU need to wait 1000us even though an > I/O-request comes or its priority is BOOST. However, the time > interval > between two I/O requests in Netperf is just tens of microsecond, far > less than the ratelimit. That will make some I/O-request cannot be > handled in time, cause the loss of throughput. > Indeed. > I tried to reduce the rate limit manually and the throughput will > increase after that. > I saw that too. Thanks again a lot for the report, and for testing the patch. Regards, Dario --- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |