Xen project Mailing List

Re: [Xen-devel] schedulers and topology exposing questions

On Wed, Jan 27, 2016 at 03:53:38PM +0000, George Dunlap wrote: > On 27/01/16 15:27, Konrad Rzeszutek Wilk wrote: > > On Wed, Jan 27, 2016 at 03:10:01PM +0000, George Dunlap wrote: > >> On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote: > >>> On Xen - the schedule() would go HLT.. and then later be woken up by the > >>> VIRQ_TIMER. And since the two applications were on seperate CPUs - the > >>> single packet would just stick in the queue until the VIRQ_TIMER arrived. > >> > >> I'm not sure I understand the situation right, but it sounds a bit like > >> what you're seeing is just a quirk of the fact that Linux doesn't always > >> send IPIs to wake other processes up (either by design or by accident), > > > > It does and it does not :-) > > > >> but relies on scheduling timers to check for work to do. Presumably > > > > It .. I am not explaining it well. The Linux kernel scheduler when > > called for 'schedule' (from the UDP sendmsg) would either pick the next > > appliction and do a context swap - of if there were none - go to sleep. > > [Kind of - it also may do an IPI to the other CPU if requested ,but that > > requires > > some hints from underlaying layers] > > Since there were only two apps on the runqueue - udp sender and udp receiver > > it would run them back-to back (this is on baremetal) > > I think I understand at a high level from your description what's > happening (No IPIs -> happens to run if on the same cpu, waits until > next timer tick if on a different cpu); but what I don't quite get is > *why* Linux doesn't send an IPI. Wait no no. "happens to run if on the same cpu" - only if on baremetal or if we expose SMT topology to a guest. Otherwise the applications are not on the same CPU. The sending IPI part is because there are two CPUs - and the apps on those two runqeueus are not intertwined from the perspective of the scheduler. (Unless the udp code has given the scheduler hints). However if I tasket the applications on the same vCPU (this being without exposing SMT threads or just the normal situation as today) - the scheduler will send IPI context switches. Then I found that if I enable vAPIC and disable event channels for IPIs and only use the native APIC machinery for (aka vAPIC) we can even do less VMEXITs, but that is a different story: http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg00897.html > > It's been quite a while since I looked at the Linux scheduling code, so > I'm trying to understand it based a lot on the Xen code. In Xen a vcpu > can be "runnable" (has something to do) and "blocked" (waiting for > something to do). Whenever a vcpu goes from "blocked" to "runnable", the > scheduler will call vcpu_wake(), which sends an IPI to the appropriate > pcpu to get it to run the vcpu. > > What you're describing is a situation where a process is blocked (either > in 'listen' or 'read'), and another process does something which should > cause it to become 'runnable' (sends it a UDP message). If anyone > happens to run the scheduler on its cpu, it will run; but no proactive > actions are taken to wake it up (i.e., sending an IPI). Right. And that is a UDP code decision. It called the schedule without any timeout or hints. > > The idea of not sending an IPI when a process goes from "waiting for > something to do" to "has something to do" seems strange to me; and if it > wasn't a mistake, then my only guess why they would choose to do that > would be to reduce IPI traffic on large systems. > > But whether it's a mistake or on purpose, it's a Linux thing, so... Yes :-) > > >> they knew that low performance on ping-pong workloads might be a > >> possibility when they wrote the code that way; I don't see a reason why > >> we should try to work around that in Xen. > > > > Which is not what I am suggesting. > > I'm glad we agree on this. :-) > > > Our first ideas was that since this is a Linux kernel schduler > > characteristic > > - let us give the guest all the information it needs to do this. That is > > make it look as baremetal as possible - and that is where the vCPU > > pinning and the exposing of SMT information came about. That (Elena > > pls correct me if I am wrong) did indeed show that the guest was doing > > what we expected. > > > > But naturally that requires pinning and all that - and while it is a useful > > case for those that have the vCPUs to spare and can do it - that is not > > a general use-case. > > > > So Elena started looking at the CPU bound and seeing how Xen behaves then > > and if we can improve the floating situation as she saw some abnormal > > behavious. > > OK -- if the focus was on the two cases where the Xen credit1 scheduler > (apparently) co-located two cpu-burning vcpus on sibling threads, then > yeah, that's behavior we should probably try to get to the bottom of. Right! _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.