[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH RESEND] x86/vpt: Replace per-guest pt_migrate lock with per pt lock
On 3/25/21 10:48 AM, Roger Pau Monné wrote: > On Wed, Mar 24, 2021 at 05:05:05PM -0400, Boris Ostrovsky wrote: >> Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between >> vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was >> intended to protect periodic timer during VCPU migration. Since such >> migration is an infrequent event no performance impact was expected. >> >> Unfortunately this turned out not to be the case: on a fairly large >> guest (92 VCPUs) we've observed as much as 40% TPCC performance regression >> with some guest kernels. Further investigation pointed to pt_migrate >> read lock taken in pt_update_irq() as the largest contributor to this >> regression. With large number of VCPUs and large number of VMEXITs >> (from where pt_update_irq() is always called) the update of an atomic in >> read_lock() is thought to be the main cause. > Right, seems like a very plausible analysis. > > Since I suspect most (if not all?) pt_restore_timer() (called from the scheduler) also contributes a couple of percent. But yes. > of the performance regression is > from the read_lock in pt_update_irq I think we can remove that without > doing such a big change to the current locking logic, and instead > merging part of the logic that you detail in the cover letter without > moving to a per-timer lock. > >> Stephen Brennan examined the locking pattern and suggested using a >> per-timer lock instead. This lock will need to be held whenever there is >> a chance that pt->vcpu field may change (thus avoiding XSA-336 >> condition). >> >> Suggested-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> >> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> > So while I think this is indeed a working solution, I'm not convinced > we require a per-timer lock, I think we can find some middle ground > using both a per-domain rwlock and the more fine grained per-vcpu > lock. > > Basically for type 1 accesses (pt_vcpu_{un}lock) I think we can safely > drop the read_{un}lock call, Yes, if that's the case then current rwlock should be fine. > and remove the performance bottleneck > while slightly adjusting the functions that modify the per-vcpu timer > lists to take the per-vcpu locks when doing so. > > I've tried to convey that in the comments below, while also pointing > out some suitable places where comments can be added based on the text > from your cover letter. > > Overall this should result in a smaller patch that will be both easier > to review and argue in terms of inclusion into 4.15. Sure. Thanks for the review/suggestions. -boris
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |