[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Notes on stubdoms and latency on ARM
On 07/17/2017 11:04 AM, Julien Grall wrote: > Hi, > > On 17/07/17 10:25, George Dunlap wrote: >> On 07/12/2017 07:14 AM, Dario Faggioli wrote: >>> On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: >>>> On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: >>>>>>> >>>>>> Since you are using Credit, can you try to disable context switch >>>>>> rate >>>>>> limiting? >>>>> >>>>> Yep. You are right. In the environment described above (Case 2) I >>>>> now >>>>> get much better results: >>>>> >>>>> real 1.85 >>>>> user 0.00 >>>>> sys 1.85 >>>> >>>> From 113 to 1.85 -- WOW! >>>> >>>> Obviously I am no scheduler expert, but shouldn't we advertise a bit >>>> better a scheduler configuration option that makes things _one >>>> hundred >>>> times faster_ ?! >>>> >>> So, to be fair, so far, we've bitten this hard by this only on >>> artificially constructed test cases, where either some extreme >>> assumption were made (e.g., that all the vCPUs except one always run at >>> 100% load) or pinning was used in a weird and suboptimal way. And there >>> are workload where it has been verified that it helps making >>> performance better (poor SpecVIRT results without it was the main >>> motivation having it upstream, and on by default). >>> >>> That being said, I personally have never liked rate-limiting, it always >>> looked to me like the wrong solution. >> >> In fact, I *think* the only reason it may have been introduced is that >> there was a bug in the credit2 code at the time such that it always had >> a single runqueue no matter what your actual pcpu topology was. > > FWIW, we don't yet parse the pCPU topology on ARM. AFAIU, we always tell > Xen each CPU is in its own core. Will it have some implications in the > scheduler? Just checking -- you do mean its own core, as opposed to its own socket? (Or NUMA node?) On any system without hyperthreading (or with HT disabled), that's what an x86 system will see as well. Most schedulers have one runqueue per logical cpu. Credit2 has the option of having one runqueue per logical cpu, one per core (i.e., hyperthreads share a runqueue), one runqueue per socket (i.e., all cores on the same socket share a runqueue), or one socket across the whole system. I *think* we made one socket per core the default a while back to deal with multithreading, but I may not be remembering correctly. In any case, if you don't have threads, then reporting each logical cpu as its own core is the right thing to do. If you're mis-reporting sockets, then the scheduler will be unable to take that into account. But that's not usually going to be a major issue, mainly because the scheduler is not actually in a position to determine, most of the time, which is the optimal configuration. If two vcpus are communicating a lot, then the optimal configuration is to put them on different cores of the same socket (so they can share an L3 cache); if two vcpus are computing independently, then the optimal configuration is to put them on different sockets, so they can each have their own L3 cache. Xen isn't in a position to know which one is more important, so it just assumes each vcpu is independent. All that to say: It shouldn't be a major issue if you are mis-reporting sockets. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |