Xen project Mailing List

Re: [Xen-devel] schedulers and topology exposing questions

On Thu, 2016-01-28 at 22:27 -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Jan 28, 2016 at 03:10:57PM +0000, Dario Faggioli wrote: > >Â > > So, may I ask what piece of (Linux) code are we actually talking > > about? > > Because I had a quick look, and could not find where what you > > describe > > happens.... > > udp_recvmsg->__skb_recv_datagram->sock_rcvtimeo->schedule_timeout > The sk_rcvtimeo is MAX_SCHEDULE_TIMEOUT but you can alter the > UDP by having a diffrent timeout. > Ha, recvmsg! At some point you mentioned sendmsg, and I was looking there and seeing nothing! But yes, it indeed makes sense to consider the receiving side... let me have a look... So, it looks to me that this is what happens: Âudp_recvmsg(noblock=0) Â Â| Â Â---> __skb_recv_datagram(flags=0) { Â Â Â Â Â Â Â Â timeo = sock_rcvtimeo(flags=0) /* returns sk->sk_rcvtimeoÂ*/ do {...} wait_for_more_packets(timeo); | ---> schedule_timeor(timeo) So, at least in Linux 4.4, the timeout used is the one defined in sk->sk_rcvtimeo, which it looks to me to be this (unless I've followed some link wrong, which can well be the case): http://lxr.free-electrons.com/source/include/uapi/asm-generic/socket.h#L31 #define SO_RCVTIMEOÂÂÂÂÂ20 So there looks to be a timeout. But anyways, let's check schedule_timeout(). > And MAX_SCHEDULE_TIMEOUT when it eventually calls 'schedule()' just > goes to sleep (HLT) and eventually gets woken up VIRQ_TIMER. > So, if the timeout is MAX_SCHEDULE_TIMEOUT, the function does: schedule_timeout(SCHEDULE_TIMEOUT) { Â Â schedule(); Â Â return; } If the timeout is anything else than MAX_SCHEDULE_TIMEOUT (but still a valid value), the function does: schedule_timeout(timeout) { Â Â struct timer_list timer; Â Â setup_timer_on_stack(&timer); Â Â __mod_timer(&timer); Â Â schedule(); Â Â del_singleshot_timer_sync(&timer); Â Â destroy_timer_on_stack(&timer); Â Â return; } So, in both cases, it pretty much calls schedule() just about immediately. And when schedule() it's called, the calling process -- which would be out UDP receiver-- goes to sleep. The difference is that, in case of MAX_SCHEDULE_TIMEOUT, it does not arrange for anyone to wakeup the thread that is going to sleep. In theory, it could even be stuck forever... Of course, this depends on whether the receiver thread is on a runqueue or not, if (in case it's not) if it's status is TASK_INTERRUPTIBLE OR TASK_UNINTERRUPTIBLE, etc., and, in prractice, it never happens! :-D In this case, I think we take the other branch (the one 'with timeout'). But even if we would take this one, I would expect the receiver thread to not be on any runqueue, but yet to be (either in interruptible or not state) in a blocking list from where it is taken out when a packet arrives. In case of anything different than MAX_SCHEDULE_TIMEOUT, all the above is still true, but a timer is set before calling schedule() and putting the thread to sleep. This means that, in case nothing that would wakeup such thread happens, or in case it hasn't happened yet when the timeout expires, the thread is woken up by the timer. And in fact, schedule_timeout() is not a different way, with respect to just calling schedule(), to going to sleep. It is the way you go to sleep for at most some amount of time... But in all cases, you just and immediately go to sleep! And I also am not sure I see where all that discussion you've had with George about IPIs fit into this all... The IPI that will trigger the call to schedule() that will actually put back to execution the thread that we're putting to sleep in here (i.e., the receiver), happens when the sender manages to send a packet (actually, when the packet arrives, I think) _or_ when the timer expires. The two possible calls to schedule() in schedule_timeout() behave exactly in the same way, and I don't think having a timeout or not is responsible for any particular behavior. What I think it's happening is this: when such a call to schedule() (from inside schedule_timeout(), I mean) is made what happens is that the receiver task just goes to sleep, and another one, perhaps the sender, is executed. The sender sends the packet, which arrives before the timeout, and the receiver is woken up. *Here* is where an IPI should or should not happen, depending on where our receiver task is going to be executed! And where would that be? Well, that depends on the Linux scheduler's load balancer, the behavior of which is controlled by scheduling domains flags like BALANCE_FORK, BALANCE_EXEC, BALANCE_WAKE, WAKE_AFFINE and PREFER_SIBLINGS (and others, but I think these are the most likely ones to be involved here). So, in summary, where the receiver executes when it wakes up on what is the configuration of such flags in the (various) scheduling domain(s). Check, for instance, this path: Â try_to_wakeu_up() --> select_task_irq() --> select_task_rq_fair() The reason why the tests 'reacts' to topology changes is that which set of flags is used for the various scheduling domains is, during the time the scheduling domains themselves are created and configured-- depends on topology... So it's quite possible that exposing the SMT topology, wrt to not doing so, makes one of the flag flip in a way which makes the benchmark work better. If you play with the flags above (or whatever they equivalents were in 2.6.39) directly, even without exposing the SMT-topology, I'm quite sure you would be able to trigger the same behavior. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.