[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] null scheduler bug
On Thu, 2018-09-27 at 15:15 +0200, Milan Boberic wrote: > Hi, > I applied patch and vwfi=native and everything works fine, I can > create and destroy guest domain as many times as I want. > > I have to ask, will this patch have any impact on performance (I will > test it later, but I just need your opinions)? > Well, with a question like this, the only possible answer is "depends". :-) Basically, there is a little bit of overhead to be expected, with this patch applied, every time that call_rcu() is invoked, inside Xen. Now, you can look at when that happens, and you'll notice that this basically never happen in an hot-path. In your case, there is at least one call in the domain destruction path. You can try to measure whether actually destroying the domain takes more time _with_ "wfi=native" (plus this patch) as compared to how long it takes _without_ "wfi=native" (and also without this patch). I don't think you'll be able to appreciate any significant difference. The point is more, I think, whether "wfi=native" helps your use case. Have you measure that? I mean, have you checked what is the difference in performance (or latency, or whatever you're interested in) between the "wfi=native" case and the default? If you have, and "wfi=native" helps, then you also need something like this patch, or domain destruction won't work (in fact, I call the fact that it takes 'around 7 seconds', not working). If "wfi=native" does not help your use case, then you're better off not using neither it nor this patch. > And what this patch exactly do? I need to fully understand it because > I need to document it in my master thesis which will be finished soon > thanks to you people :D > Have you heard about RCU? It's a very clever synchronization solution, widely used in the Linux kernel. Xen has that too, but we use an old version of the Linux code, and we don't use it that much. This is, IMO, some good introductory material, but, really, just google "RCU" or "RCU linux", and you'll hit tons of articles and docs: https://lwn.net/Articles/262464/ Well, our implementation of RCU requires that, from time to time, the various physical CPUs of your box become idle, or get an interrupt, or go executing inside Xen (for hypercalls, vmexits, etc). In fact, a CPU going through Xen is what allow us to tell that it reached a so-called 'quiescent state', which in turns is necessary for declaring a so- called 'RCU grace period' over. Usually, as soon as a guest (or dom0) vCPU become idle, the pCPU on which it was running does go through Xen, to figure out whether or not there is another vCPU, from the same or from another guest, to be run. If not, the pCPU stays idle, but it stays idle _in_Xen_, and that is good for RCU quiescence and grace period tracking. Now, with the combination of "sched=null" and "wfi=native", when the guest (or dom0) vCPU becomes idle, we _stay_in_the_guest_, until something (typically an interrupt) comes. This means that the vCPU in question never let Xen's RCU know that he has gone through a quiescent state, and grace periods risk lasting very long, if not forever. In fact, the reason why everything was working again with a printk() was, as Julien noted, that an interrupt was being injected. Check the old discussion on xen-devel about the RCU bug that I linked to in one of my first messages in this thread to even more insights. https://www.mail-archive.com/xen-devel@xxxxxxxxxxxxx/msg105388.html https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02770.html https://lists.xen.org/archives/html/xen-devel/2017-09/msg01855.html https://lists.xen.org/archives/html/xen-devel/2017-09/msg03515.html https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg01855.html Setting the qhimark, qlowmark and blimit to the values you see in the patch, partially defeats the purpose of RCU, as the update of the data structure is not deferred to some future point in time, but it is basically always performed synchronously with the modification, and that's why I dislike just doing it all the time, and I prefer limiting to doing it when we're using "wfi=native". For some more details about the meaning of the qhimark, qlowmark and blimit values, check these: https://www.systutorials.com/linux-kernels/132439/patch-rcu-batch-tuning-linux-2-6-16/ https://lwn.net/Articles/166647/ Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |