[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/arm: introduce vwfi parameter
On Tue, 2017-02-21 at 07:59 +0000, Julien Grall wrote: > On 20/02/2017 22:53, Dario Faggioli wrote: > > For instance, as you say, executing a WFI from a guest directly on > > hardware, only makes sense if we have 1:1 static pinning. Which > > means > > it can't just be done by default, or with a boot parameter, because > > we > > need to check and enforce that there's only 1:1 pinning around. > > I agree it cannot be done by default. Similarly, the poll mode cannot > be > done by default in platform nor by domain because you need to know > that > all vCPUs will be in polling mode. > No, that's the big difference. Polling (which, as far as this patch goes, is yielding, in this case) is generic in the sense that, no matter the pinned or non-pinned state, things work. Power is wasted, but nothing breaks. Not trapping WF* is not generic in the sense that, if you do in the pinned case, i (probably) works. If you lift the pinning, but leave the direct WF* execution in place, everything breaks. This is all I'm saying: that if you say, not trapping is an alternative to this patch, well, it is not. Not trapping _plus_ measures for preventing things to break, is an alternative. Am I nitpicking? Perhaps... In which case, sorry. :-P > But as I said, if vCPUs are not pinned this patch as very little > advantage because you may context switch between them when yielding. > Smaller advantage, sure. How much smaller, hard to tell. That is the reason why I see some potential value in this patch, especially if converted to doing its thing per-domain, as George suggested. One can try (and, when that happens, we'll show a big WARNING about wasting power an heating up the CPUs!), and decide whether the result is good or not for the specific use case. > > Is it possible to decide whether to trap and emulate WFI, or just > > execute it, online, and change such decision dynamically? And even > > if > > yes, how would the whole thing work? When the direct execution is > > enabled for a domain we automatically enforce 1:1 pinning for that > > domain, and kick all the other domain out of its pcpus? What if > > they > > have their own pinning, what if they also have 'direct WFI' > > behavior > > enabled? > > It can be changed online, the WFI/WFE trapping is per pCPU (see > HCR_EL2.{TWE,TWI} > Ok, thanks for the info. Not bad. With added logic (perhaps in the nop scheduler), this looks like it could be useful. > > These are just examples, my point being that in theory, if we > > consider > > a very specific usecase or set of usecase, there's a lot we can do. > > But > > when you say "why don't you let the guest directly execute WFI", in > > response to a patch and a discussion like this, people may think > > that > > you are actually proposing doing it as a solution, which is not > > possible without figuring out all the open questions above > > (actually, > > probably, more) and without introducing a lot of cross-subsystem > > policing inside Xen, which is often something we don't want. > > I made this response because the patch sent by Stefano as a very > specific use case that can be solved the same way. Everyone here is > suggesting polling but it has it is own disadvantage: power > consumption. > > Anyway, I still think in both case we are solving a specific problem > without looking at what matters. I.e Why the scheduler takes so much > time to block/unblock. > Well, TBH, we still are not entirely sure who the culprit is for high latency. There are spikes in Credit2, and I'm investigating that. But apart from them? I think we need other numbers with which we can compare the numbers that Stefano has collected. I'll send code for the nop scheduler, and we will compare with what we'll get with it. Another interesting data point would be knowing how the numbers look like on baremetal, on the same platform and under comparable conditions. And I guess there are other components and layers, in the Xen architecture, that may be causing increased latency, which we may have not identified yet. Anyway, nop scheduler is probably first thing we want to check. I'll send the patches soon. > > > So, yes in fine the guest will waste its slot. > > > > > Did I say it already that this concept of "slots" does not apply > > here? > > :-D > > Sorry forgot about this :/. I guess you use the term credit? If so, > the > guest will use its credit for nothing. > If the guest is alone, or in general the system is undersubscribed, it would, by continuously yielding in a busy loop, but that doesn't matter, because there are enough pCPUs to run even vCPUs that are out of credits. If the guest is not alone, and the system is oversubscribed, it would use a very tiny amount of its credits, every now and then, i.e., the ones that are necessary to execute a WFI, and, for Xen, to issue a call to sched_yield(). But after that, we will run someone else. This to say that the problem of this patch might be that, in the oversubscribed case, it relies too much on the behavior of yield, but not that it does nothing. But maybe I'm nitpicking again. Sorry. I don't get to talk about these inner (and very interesting, to me at least) scheduling details too often, and when it happens, I tend to get excited and exaggerate! :-P Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |