[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Notes on stubdoms and latency on ARM
On Mon, 2017-07-17 at 12:28 +0100, George Dunlap wrote: > Most schedulers have one runqueue per logical cpu. Credit2 has the > option of having one runqueue per logical cpu, one per core (i.e., > hyperthreads share a runqueue), one runqueue per socket (i.e., all > cores > on the same socket share a runqueue), or one socket across the whole > system. > You mean "or one runqueue across the whole system", I guess? :-) > I *think* we made one socket per core the default a while back > to deal with multithreading, but I may not be remembering correctly. > We've have per-core runqueue as default, to deal with hyperthreading for some time. Nowadays, handling hyperthreading is done independently by runqueue arrangement, and so the current default is one runqueue per-socket. > In any case, if you don't have threads, then reporting each logical > cpu as its own core is the right thing to do. > Yep. > If you're mis-reporting sockets, then the scheduler will be unable to > take that into account. > And if this means that each logical CPU is also reported as being its own socket, then you have one runqueue per logical CPU. > But that's not usually going to be a major > issue, mainly because the scheduler is not actually in a position to > determine, most of the time, which is the optimal configuration. If > two > vcpus are communicating a lot, then the optimal configuration is to > put > them on different cores of the same socket (so they can share an L3 > cache); if two vcpus are computing independently, then the optimal > configuration is to put them on different sockets, so they can each > have > their own L3 cache. > This is all very true. However, if two CPUs share one runqueue, vCPUs will seamlessly move between the two CPUs, without having to wait for the load balancing logic to kick in. This is a rather cheap way of achieving good fairness and load balancing, but is only effective if this movement is also cheap, which, e.g., is probably the case if the CPUs share some level of cache. So, figuring out what the best runqueue arrangement is, is rather hard to do automatically, as it depends both on the workload and on the hardware characteristics of the platform, but having at last some degree of runqueue sharing, among the CPUs that have some cache levels in common, would be, IMO, our best bet. And we do need topology information to try to do that. (We would also need, in Credit2 code, to take more into account cache and memory hierarchy information, rather than "just" CPU topology. We're already working, for instance, of changing CSCHED2_MIGRATE_RESIST from being constant, to vary depending on the amount of cache-sharing between two CPUs.) > All that to say: It shouldn't be a major issue if you are mis- > reporting > sockets. :-) > Maybe yes, maybe not. It may actually be even better on some combination of platforms and workloads, indeed... but it also means that the Credit2 load balancer is being invoked a lot, which may be unideal. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |