[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Some interesting results from schedbench
On Mon, Aug 1, 2016 at 5:26 PM, George Dunlap <george.dunlap@xxxxxxxxxx> wrote: > This is due to the difference between credit1 and credit2's load > balancing. Credit1 randomly shifts stuff around based on what it sees > at this instant. Which means that much of the time, it has {A,B} {A,B}, > but it frequently ends up with {A,A} {B,B}. > > Credit2 measures the load average for runqueues over the long haul and > tries to make the *runqueue* average the same; and since our default now > is one runqueue per core, that means that it will almost immedeately go > to {A,B} {A,B} and never change it. > > The aggregate throughput for the system seems to be slightly higher > under credit1 (645Mops credit1 vs 639Mops credit2). > > It's actually somewhat arguable what the optimal thing to do here is -- > one could argue that "fairness" in the case of hyperthreads should mean > that if you leave space for someone else to run at 'boost', that you > should be given space for someone else to be run at 'boost'. > > But that's probably an optimization for another day: on the whole I > think credit2's rational approach to balancing load is much better. BTW, Dario asked me if I could run the same test with credit2_runqueue=socket (i.e., one runqueue per socket rather than one runqueue per core, which is currently the default). This would mean that all vcpus would share the same socket. credit2 has some mechanisms in place to make sure that vcpus don't migrate between processors too frequently; but all the vcpus would be sharing the same credit mechanism. Strangely enough, this setup tilted things *very much* in favor of workload A: credit (run 1): A: 292Mops B: 353Mops credit (run 2): A: 255Mops B: 386Mops credit2, core: A: 241Mops B: 396Mops credit2, socket: A: 335Mops B: 304Mops In the credit2 socket case, workload A vcpus got even less -about 50% each -- while workload B got even more: about 100% each. It makes me wonder if in this case the *lack* of balancing is actually the issue: we have resistance to migrating things between cpus even within one runqueue, so if the initial placement put things on {A,A} {B,B}, without the overloading to prompt migration, they might never migrate away. This needs some more looking into. FWIW as things get more overloaded this effect goes away -- the throughput is similar but the variance gets even lower. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |