[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/vpt: Do not take pt_migrate rwlock in some cases



On 29.03.2021 12:40, Roger Pau Monné wrote:
> On Mon, Mar 29, 2021 at 11:56:56AM +0200, Jan Beulich wrote:
>> On 27.03.2021 02:51, Boris Ostrovsky wrote:
>>> Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between
>>> vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was
>>> intended to protect periodic timer during VCPU migration. Since such
>>> migration is an infrequent event no performance impact was expected.
>>>
>>> Unfortunately this turned out not to be the case: on a fairly large
>>> guest (92 VCPUs) we've observed as much as 40% TPCC performance
>>> regression with some guest kernels. Further investigation pointed to
>>> pt_migrate read lock taken in pt_update_irq() as the largest contributor
>>> to this regression. With large number of VCPUs and large number of VMEXITs
>>> (from where pt_update_irq() is always called) the update of an atomic in
>>> read_lock() is thought to be the main cause.
>>>
>>> Stephen Brennan analyzed locking pattern and classified lock users as
>>> follows:
>>>
>>> 1. Functions which read (maybe write) all periodic_time instances
>>> attached to a particular vCPU. These are functions which use pt_vcpu_lock()
>>> after the commit, such as pt_restore_timer(), pt_save_timer(), etc.
>>> 2. Functions which want to modify a particular periodic_time object.
>>> These guys lock whichever vCPU the periodic_time is attached to, but
>>> since the vCPU could be modified without holding any lock, they are
>>> vulnerable to the bug. Functions in this group use pt_lock(), such as
>>> pt_timer_fn() or destroy_periodic_time().
>>> 3. Functions which not only want to modify the periodic_time, but also
>>> would like to modify the =vcpu= fields. These are create_periodic_time()
>>> or pt_adjust_vcpu(). They create the locking imbalance bug for group 2,
>>> but we can't simply hold 2 vcpu locks due to the deadlock risk.
>>>
>>> Roger Monné then pointed out that group 1 functions don't really need
>>> to hold the pt_migrate rwlock and that group 3 should be hardened by
>>> holding appropriate vcpu's tm_lock when adding or deleting a timer
>>> from its list.
>>
>> I'm struggling some with the latter aspect: Is this to mean there is
>> something wrong right now?
> 
> There's nothing wrong right now AFAICT , because per-vcpu users (ie:
> type 1) also hold the rw lock in read mode when iterating over the
> per-vcpu list.
> 
>> Or does "harden" really mean "just to be
>> on the safe side" here?
> 
> If the per-domain rw lock is no longer read-locked by type 1 users
> then type 2 and type 3 users need to make sure the per-vcpu lock is
> taken before adding or removing items from the per-vcpu lists, or else
> a type 1 user could see list corruption as a result of modifications
> done by type 2 or 3 without holding the per-vcpu lock.

Ah, right. Boris, may I then ask to avoid the somewhat ambiguous word
"harden" here then, and instead make clear that the new locking is in
fact "balancing" removal of locks elsewhere?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.