[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/vpt: Do not take pt_migrate rwlock in some cases


  • To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 29 Mar 2021 12:31:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MGJeBm3UaA8G4RqQ+ge9B57rYVXALhNtPu9waKz7J74=; b=ByTLiDczHFvF7AIuCTJqmPGlyezABMQ97D/Vw1j/NGBT+dGFikC/z6GWjVcKtN3na/fvZ9olonDAh9BJE6h5WqcFIo/e4uB/A3CtC1JBrWubxLtW5T4g5gLn8IBo1Wr2xtNnZJSaH0lx+W9TmbwvjzL/rfSF4Kaza9LV6dxAJuw3PBzcct9entel6jdxPvvP3gHhbBUmeDZ8bfR1WF7rNYFheRCEtJfHK8UN216m5pAOmDqNw/7ENkNs/38SIL57WEnOy6tb3yTpkTGHeIUti7NyNhYyXb9flnXQxQcoPaKAeU6PMXCBzbsOFUVPz+xEZT2wTf5tz3OhkNRYb2d0Yw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S8c8wA9tuzMpPUH3vUK7Oli0BmooeTUtPN8M7nykxT1iK4XtRsJ3dLkedaFzs1lx0vJeVFcH8XcRYBXDk+OUW9qi8GlbW5Up/OdaPNy5BlN0Rlq0/t5qc/peAzzzVefq4Pgs5wft2SVsV/gvUNdq3WVBqoCRA4nSBbbtnwBW7tzTmZfItZ6zzMqI0vxjbDIhZjInEgIXzf43TRhdAYZ50yBCc0N2bnygZ8PKz0pRDogC6UkHAC/pxxxOIZS9DDVQLp7HrqW/D/eCf3z76oqFNAFW6L0Z2jdYC2XmPDuzUClE7HWwL+Mu40Y5vZEAgtSHFyheStF/npNH9FEQVm+NnQ==
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <jbeulich@xxxxxxxx>, <andrew.cooper3@xxxxxxxxxx>, <wl@xxxxxxx>, <stephen.s.brennan@xxxxxxxxxx>, <iwj@xxxxxxxxxxxxxx>
  • Delivery-date: Mon, 29 Mar 2021 10:31:53 +0000
  • Ironport-hdrordr: A9a23:KD7aNKmX275P2XZz7mk7XFNDDDzpDfOej2dD5ilNYBxZY6Wkvu iUtrAyyQL0hDENWHsphNCHP+26TWnB8INuiLN+AZ6LZyOjnGezNolt4c/ZwzPmEzDj7eI178 hdWoBEIpnLAVB+5PyX3CCRD8sgzN6b8KqhmOfZyDNXQRt3brx7hj0YNi+wOCRNNW97LLA+E4 eR4dcCijq7YHIMbtm6AH5tZZm/m/TgkpX6bRkaQyM94A6Vgj+yrJL8GR6U3hAROgk/vYsK22 7DjgD/++Gfo+i2oyWsrVP7wrZ3vJ/aytVFDNGRkcR9EFTRoyuheYgJYczmgBkbu+eqgWxa9O XkgxBlBMhr7mOUQ2fdm2qT5yDF8BIDr0Dv0kWZh3yLm72LeBsfB9BajYxUNjv1gnBQxO1U66 5A02KHu5c/N3qp906Rlru4NWAeqmOOrXUviuIVhXBEOLFuE4N5loAD4FhTVK4JASOS0vFWLM BVEMre6PxKGGnqFkzxg28H+q3KYl0DWj2CQkQEp/WP1SlXkH1T3yIjtb0it0ZF25QnR5Ze4e PYdoxuibFVV8cTKZlwHeEbXKKMeyPwaCOJFFjXDUXsFakBNX6Ig5nr4I8t7OXvXJAT1pM9lL nITVswjx99R2veTem1mLFb+BHER2uwGR73zNtF2pR/srrgAJL2LCyqUjkV4oidisRaJveed+ e4OZpQDfOmB3DpA5x10wr3XIQXAWUCUfcSps0wVzu104L2A7yvktaeXOfYJbLrHzphcHj4GG E/UD/6I9gFwVusXlP+nRjNS1LgckHy5vtLYe3n1tlW7LJIGpxHswATh1j8zNqMMyd+vqs/e1 Y7AL6PqNL+mUCGuULzq0l5MBtUCUhYpJ/6VWlRmAMMO0ToNZIKu9CVf3FuzGKKTyUPCP/+IU p6nRBa6Ki3J5ufyWQJENS8KF+XiHMVuTasQ4oDnLaAoePoYIkxAJpjeKEZL3SLKzVF3SJR7E tTYg4NQUHSUhn0j7++sZASDObDM/9mgAmqJsZQgWnFtVqVoPwuQndzZU/tbeenxSIVAxZEjF x49KESxJCanyy0FGc5iOMkdGFXZH+vG7JABgSdbIBykrTmETsAC1uitHi/sVUea2Dq/0Idii jEITePcf/GOFZbp0tVy73n6l9ya2WbcX9hc3wSi/wOKU32/lJIlcObbKu61GWcLmEPxewQKx npSzofKAEG/aH86Del3BK5UVk2zJQnOeLQSIk5e7bIw3W3NcmjjqcdBcJZ+55jKfHjuuIGSv ikZgeQNT/0YtlZgjC9lzIAAm1ZuXMkmfTn1FnZ92C+xmc4GueXD1J8Rb0XSuvsplTMdrKt6t Fegt00t+frbTm0Rd6C1K3NbzlMbjnUunW7Suk0qZZS+YI+3YEDaKXzYH/t7jVg2h57EeLf0G U5a45/6KraOoBuc9cJEhgptWYBpZCqFg8TrgfyAuUCZlkjgH/QAsOR79Pz2M4SK3zEgDG1BE KW/CJc9crURiev1bYVDKQrPGRdAXJMnkhKzaend4fKDh+tePwG1F2mMmWleLs1ctnJJZwg6j J76cqPhemZam7R3x3RpyJyJuZr/3y8Sc2/RCKKFuggya33BX28xo+r6tW0ljH5VH+SbFkZn5 RMcQgoVft44wNSxLEf42yVUaz4okUsjltY73VGrzfWq/abyVaeO1pHPw3fiohRRh9JPBGz/J z4zdQ=
  • Ironport-sdr: 1divU9kWANyTkM6K86xItzMAjAqMPveCtVC4eXF9+Zkv78y77VpbFEj4M2425VmHiodALCPIOE gG2Z1hAbJ8Qgurzoco0qCnOpkTuVShMw3cOUe+QQFVO3HWms8cVNQsSjxKgSYgbvKfT/yiqwMG G/f7YStsqAgzcGP8vTEjvlQKN9kUPy3059d01GBzhyaMjwLXj8AK/pyiAf7vRAUo9Fcc+9gxWu 1u75McpLvoOsweOkFxuvNN3dqihsf98q9WItGs9/K9Q2myHrrORluiUO4G92TGr0+ISO1c5XzI U2s=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Mar 26, 2021 at 09:51:06PM -0400, Boris Ostrovsky wrote:
> Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between
> vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was
> intended to protect periodic timer during VCPU migration. Since such
> migration is an infrequent event no performance impact was expected.
> 
> Unfortunately this turned out not to be the case: on a fairly large
> guest (92 VCPUs) we've observed as much as 40% TPCC performance
> regression with some guest kernels. Further investigation pointed to
> pt_migrate read lock taken in pt_update_irq() as the largest contributor
> to this regression. With large number of VCPUs and large number of VMEXITs
> (from where pt_update_irq() is always called) the update of an atomic in
> read_lock() is thought to be the main cause.
> 
> Stephen Brennan analyzed locking pattern and classified lock users as
> follows:
> 
> 1. Functions which read (maybe write) all periodic_time instances
> attached to a particular vCPU. These are functions which use pt_vcpu_lock()
> after the commit, such as pt_restore_timer(), pt_save_timer(), etc.
> 2. Functions which want to modify a particular periodic_time object.
> These guys lock whichever vCPU the periodic_time is attached to, but
> since the vCPU could be modified without holding any lock, they are
> vulnerable to the bug. Functions in this group use pt_lock(), such as
> pt_timer_fn() or destroy_periodic_time().
> 3. Functions which not only want to modify the periodic_time, but also
> would like to modify the =vcpu= fields. These are create_periodic_time()
> or pt_adjust_vcpu(). They create the locking imbalance bug for group 2,
> but we can't simply hold 2 vcpu locks due to the deadlock risk.
> 
> Roger Monné then pointed out that group 1 functions don't really need

Roger alone is fine, or else it would have to be Roger Pau (Monné is
my second surname).

> to hold the pt_migrate rwlock and that group 3 should be hardened by
> holding appropriate vcpu's tm_lock when adding or deleting a timer
> from its list.
> 
> Suggested-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx>
> Suggested-by: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Some nits below.

> ---
> v2: Drop per-periodic_time spinlock and keep pt_migrate rwlock (and thus
>     change patch subject)
> 
>  xen/arch/x86/hvm/vpt.c        | 40 +++++++++++++++++++++++++++++++---------
>  xen/include/asm-x86/hvm/vpt.h |  8 ++++----
>  2 files changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
> index 4c2afe2e9154..893641f20e1c 100644
> --- a/xen/arch/x86/hvm/vpt.c
> +++ b/xen/arch/x86/hvm/vpt.c
> @@ -153,32 +153,43 @@ static int pt_irq_masked(struct periodic_time *pt)
>      return 1;
>  }
>  
> +/*
> + * Functions which read (maybe write) all periodic_time instances
> + * attached to a particular vCPU use these locking helpers.

I would replace 'these' with pt_vcpu_{un}lock, to make it clearer.

> + *
> + * Such users are explicitly forbidden from changing the value of the
> + * pt->vcpu field, because another thread holding the pt_migrate lock
> + * may already be spinning waiting for your vcpu lock.
> + */
>  static void pt_vcpu_lock(struct vcpu *v)
>  {
> -    read_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
>      spin_lock(&v->arch.hvm.tm_lock);
>  }
>  
>  static void pt_vcpu_unlock(struct vcpu *v)
>  {
>      spin_unlock(&v->arch.hvm.tm_lock);
> -    read_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
>  }
>  
> +/*
> + * Functions which want to modify a particular periodic_time object
> + * use these locking helpers.

Same here, I would use pt_{un}lock instead of 'these' to make it
clearer.

> + *
> + * These users lock whichever vCPU the periodic_time is attached to,
> + * but since the vCPU could be modified without holding any lock, they
> + * need to take an additional lock that protects against pt->vcpu
> + * changing.
> + */
>  static void pt_lock(struct periodic_time *pt)
>  {
> -    /*
> -     * We cannot use pt_vcpu_lock here, because we need to acquire the
> -     * per-domain lock first and then (re-)fetch the value of pt->vcpu, or
> -     * else we might be using a stale value of pt->vcpu.
> -     */
>      read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
>      spin_lock(&pt->vcpu->arch.hvm.tm_lock);
>  }
>  
>  static void pt_unlock(struct periodic_time *pt)
>  {
> -    pt_vcpu_unlock(pt->vcpu);
> +    spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
> +    read_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
>  }
>  
>  static void pt_process_missed_ticks(struct periodic_time *pt)
> @@ -543,8 +554,10 @@ void create_periodic_time(
>      pt->cb = cb;
>      pt->priv = data;
>  
> +    pt_vcpu_lock(pt->vcpu);
>      pt->on_list = 1;
>      list_add(&pt->list, &v->arch.hvm.tm_list);
> +    pt_vcpu_unlock(pt->vcpu);
>  
>      init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
>      set_timer(&pt->timer, pt->scheduled);
> @@ -580,13 +593,22 @@ static void pt_adjust_vcpu(struct periodic_time *pt, 
> struct vcpu *v)
>          return;
>  
>      write_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
> +
> +    pt_vcpu_lock(pt->vcpu);
> +    if ( pt->on_list )
> +        list_del(&pt->list);
> +    pt_vcpu_unlock(pt->vcpu);
> +
>      pt->vcpu = v;
> +
> +    pt_vcpu_lock(pt->vcpu);
>      if ( pt->on_list )
>      {
> -        list_del(&pt->list);
>          list_add(&pt->list, &v->arch.hvm.tm_list);
>          migrate_timer(&pt->timer, v->processor);
>      }
> +    pt_vcpu_unlock(pt->vcpu);
> +
>      write_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
>  }
>  
> diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
> index 39d26cbda496..f3c2a439630a 100644
> --- a/xen/include/asm-x86/hvm/vpt.h
> +++ b/xen/include/asm-x86/hvm/vpt.h
> @@ -129,10 +129,10 @@ struct pl_time {    /* platform time */
>      struct HPETState vhpet;
>      struct PMTState  vpmt;
>      /*
> -     * rwlock to prevent periodic_time vCPU migration. Take the lock in read
> -     * mode in order to prevent the vcpu field of periodic_time from 
> changing.
> -     * Lock must be taken in write mode when changes to the vcpu field are
> -     * performed, as it allows exclusive access to all the timers of a 
> domain.
> +     * Functions which want to modify the vcpu field of the vpt need to
> +     * hold the global lock (pt_migrate) in write mode together with the
> +     * per-vcpu locks of the lists being modified. Note that two vcpu
> +     * locks cannot be held at the same time to avoid a deadlock.

I would maybe word this as:

    /*
     * Functions which want to modify the vcpu field of the vpt need
     * to hold the global lock (pt_migrate) in write mode together
     * with the per-vcpu locks of the lists being modified. Functions
     * that want to lock a periodic_timer that's possibly on a
     * different vCPU list need to take the lock in read mode first in
     * order to prevent the vcpu filed of periodic_timer from
     * changing.
     *
     * Note that two vcpu locks cannot be held at the same time to
     * avoid a deadlock.
     */

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.