Xen project Mailing List

RE: [Xen-devel] Linux spin lock enhancement on xen

To: George Dunlap <dunlapg@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

From: "Dong, Eddie" <eddie.dong@xxxxxxxxx>

Date: Wed, 25 Aug 2010 09:03:56 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "Xen-devel@xxxxxxxxxxxxxxxxxxx" <Xen-devel@xxxxxxxxxxxxxxxxxxx>, "Dong, Eddie" <eddie.dong@xxxxxxxxx>

Delivery-date: Tue, 24 Aug 2010 18:09:28 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: ActDY5y+w4LxirVdRp6Sc3pm0pXzCwAjAyjA

Thread-topic: [Xen-devel] Linux spin lock enhancement on xen

George Dunlap wrote: > Wow, I totally missed this thread. > > A couple of thoughts; > > Complicated solutions for the scheduler are a really bad idea. It's > hard enough to predict and debug the side-effects of simple > mechanisms; a complex mechanism is doomed to failure at the outset. > > I agree with Jeremy, that the guest shouldn't tell Xen to run a > specific VCPU. At most it should be something along the lines of, "If > you're going to run any vcpu from this domain, please run vcpu X." > > Jeremy, do you think that changes to the HV are necessary, or do you > think that the existing solution is sufficient? It seems to me like > hinting to the HV to do a directed yield makes more sense than making > the same thing happen via blocking and event channels. OTOH, that > gives the guest a lot more control over when and how things happen. > > Mukesh, did you see the patch by Xiantao Zhang a few days ago, > regarding what to do on an HVM pause instruction? I thought the > solution he had was interesting: when yielding due to a spinlock, > rather than going to the back of the queue, just go behind one person. > I think an impleentation of "yield_to" that might make sense in the > credit scheduler is: > * Put the yielding vcpu behind one cpu > * If the yield-to vcpu is not running, pull it to the front within its > priority. (I.e., if it's UNDER, put it at the front so it runs next; > if it's OVER, make it the first OVER cpu.) > > Thoughts? > What Xiantao (and I internally) proposed is to implement temporary coscheduling to solve spin-lock issues no matter FIFO spin-lock or ordinary spin-lock, utilizing PLE exit (of course can work with PV spin-lock as well). Here is our thinking (please refer to Xiantao's mail as well): There are 2 typical solution to improve spin lock efficiency in virtualization: A) lock holder preemption avoidance (or co-scheduling), and B) helping locks which donates the spinning CPU cycles for overal system utilization. #A solves spin-lock issue best, however it requires hardware assistance to detect lock holder which is impratical, or coscheduling which is hard to be implement efficiently and sacrifficing lots of scheduler flexibility. Neither Xen or KVM implemented that. #B (current Xen policy with PLE_yeilding) may help system performance, however it may not help the performance of spinning guest. In some cases the guest may become even worse due to long waiting (yield) of spin-lock. In some cases it may get back additional CPU cycles (and performance) from VMM scheduler complementing to its previous CPU cycle donation. In general, #B may help system performance if it is right overcommitted, but it also hurt single guest "speed" depending. An additional issue in #B is that it may hurt FIFO spin lock (ticket spin-lock in Linux and queued spin-lock in Windwos from Windows 2000), where only the first-in waiting VCPU is able to get lock from OS design perspective. Current PLE won't be able to know which one is the next (First In) waiting VCPU and which one is lock holder. [Proposed optimization] Lock holder preemption avoidance is the right solution to fully utilize hardware PLE capability, the current solution is simply hurting the performance, and we need to improve it with solution #A. Given that current hardware is unable to tell which VCPU is lock holder or which one is the next (First In) waiting VCPU? Coscheduling may be the choice. However, Coscheduling has that many side effect as well (somebody said other company using co-scheduling is going to give up as well). This proposal is to do temporary coscheduling on top of existing VMM scheduling. The details are: When one or more of VCPU of a guest is waiting for a spin-lock, we can temporary increase the priority of all VCPUs of the same geust to be scheduled in for a short period. The period will be pretty small here to avoid the impact of "coscheduling" to overall VMM scheduler. The current Xen patch simply "boost" the VCPUs which already show great gain, but there may be more tuning in optimized parameter for this algorithm. I believe this will be a perfect solution to spin-lock issue with PLE in for now (when VCPU # is not dramatically large. vConsolidate (mix of LInux and Windows guest) shows 19% consolidation performance gain, that is so great to believe even, but it is true :) We are investing more for different workload, and will post new patch soon. Of course if PV guest is running in PVM container, the PVed spin-lock is still needed. But I am doubting its necessity if PVM is running on top of HVM container :) Thx, Eddie _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.