[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/12] evtchn: convert domain event lock to an r/w one



On Mon, Sep 28, 2020 at 01:02:43PM +0200, Jan Beulich wrote:
> Especially for the use in evtchn_move_pirqs() (called when moving a vCPU
> across pCPU-s) and the ones in EOI handling in PCI pass-through code,
> serializing perhaps an entire domain isn't helpful when no state (which
> isn't e.g. further protected by the per-channel lock) changes.
> 
> Unfortunately this implies dropping of lock profiling for this lock,
> until r/w locks may get enabled for such functionality.
> 
> While ->notify_vcpu_id is now meant to be consistently updated with the
> per-channel lock held for writing, an extension applies to ECS_PIRQ: The
> field is also guaranteed to not change with the per-domain event lock
> held. Therefore the unlink_pirq_port() call from evtchn_bind_vcpu() as
> well as the link_pirq_port() one from evtchn_bind_pirq() could in
> principle be moved out of the per-channel locked regions, but this
> further code churn didn't seem worth it.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> RFC:
> * In evtchn_bind_vcpu() the question is whether limiting the use of
>   write_lock() to just the ECS_PIRQ case is really worth it.

IMO I would just use use write_lock() at the top of the function in
place of the current spin_lock. The more fine grained change should be
done as a follow up patch if it's worth it. TBH event channels
shouldn't change vCPU that frequently that using a more fine grained
approach matters much.

> * In flask_get_peer_sid() the question is whether we wouldn't better
>   switch to using the per-channel lock.
>  
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -465,7 +465,7 @@ int msixtbl_pt_register(struct domain *d
>      int r = -EINVAL;
>  
>      ASSERT(pcidevs_locked());
> -    ASSERT(spin_is_locked(&d->event_lock));
> +    ASSERT(rw_is_write_locked(&d->event_lock));

FWIW, we could switch rw_is_write_locked to use
_is_write_locked_by_me (or introduce rw_is_write_locked_by_me, albeit
I think all users of rw_is_write_locked care about the lock being
taken by them).

> @@ -1098,7 +1108,7 @@ int evtchn_reset(struct domain *d, bool
>      if ( d != current->domain && !d->controller_pause_count )
>          return -EINVAL;
>  
> -    spin_lock(&d->event_lock);
> +    read_lock(&d->event_lock);
>  
>      /*
>       * If we are resuming, then start where we stopped. Otherwise, check
> @@ -1109,7 +1119,7 @@ int evtchn_reset(struct domain *d, bool
>      if ( i > d->next_evtchn )
>          d->next_evtchn = i;

Using the read lock to write to d->next_evtchn here...

>  
> -    spin_unlock(&d->event_lock);
> +    read_unlock(&d->event_lock);
>  
>      if ( !i )
>          return -EBUSY;
> @@ -1121,14 +1131,14 @@ int evtchn_reset(struct domain *d, bool
>          /* NB: Choice of frequency is arbitrary. */
>          if ( !(i & 0x3f) && hypercall_preempt_check() )
>          {
> -            spin_lock(&d->event_lock);
> +            write_lock(&d->event_lock);
>              d->next_evtchn = i;

... but the write lock here instead seems inconsistent.

> -            spin_unlock(&d->event_lock);
> +            write_unlock(&d->event_lock);
>              return -ERESTART;
>          }
>      }
>  
> -    spin_lock(&d->event_lock);
> +    write_lock(&d->event_lock);
>  
>      d->next_evtchn = 0;
>  
> @@ -1557,7 +1568,7 @@ static void domain_dump_evtchn_info(stru
>             "Polling vCPUs: {%*pbl}\n"
>             "    port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
>  
> -    spin_lock(&d->event_lock);
> +    read_lock(&d->event_lock);

Since this is a debug key, I would suggest using read_trylock in order
to prevent blocking if a CPU is stuck while holding the event_lock in
write mode.


> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -105,7 +105,7 @@ static void pt_pirq_softirq_reset(struct
>  {
>      struct domain *d = pirq_dpci->dom;
>  
> -    ASSERT(spin_is_locked(&d->event_lock));
> +    ASSERT(rw_is_write_locked(&d->event_lock));
>  
>      switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
>      {
> @@ -162,7 +162,7 @@ static void pt_irq_time_out(void *data)
>      const struct hvm_irq_dpci *dpci;
>      const struct dev_intx_gsi_link *digl;
>  
> -    spin_lock(&irq_map->dom->event_lock);
> +    read_lock(&irq_map->dom->event_lock);

Is it fine to use the lock in read mode here? It's likely to change
the flags by adding HVM_IRQ_DPCI_EOI_LATCH, and hence should use the
lock in write mode?

As I think that's the lock that's supposed to protect changes to the
flags field?

>  static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci 
> *pirq_dpci)
> @@ -893,7 +893,7 @@ static void hvm_dirq_assist(struct domai
>          return;
>      }
>  
> -    spin_lock(&d->event_lock);
> +    read_lock(&d->event_lock);

It's also not clear to me that a read lock can be used here, since you
increase a couple of counters of hvm_pirq_dpci which doesn't seem to
be protected by any other lock?

>      if ( test_and_clear_bool(pirq_dpci->masked) )
>      {
>          struct pirq *pirq = dpci_pirq(pirq_dpci);
> @@ -947,7 +947,7 @@ static void hvm_dirq_assist(struct domai
>      }
>  
>   out:
> -    spin_unlock(&d->event_lock);
> +    read_unlock(&d->event_lock);
>  }
>  
>  static void hvm_pirq_eoi(struct pirq *pirq,
> @@ -1012,7 +1012,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
>  
>      if ( is_hardware_domain(d) )
>      {
> -        spin_lock(&d->event_lock);
> +        read_lock(&d->event_lock);
>          hvm_gsi_eoi(d, guest_gsi, ent);
>          goto unlock;
>      }
> @@ -1023,7 +1023,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
>          return;
>      }
>  
> -    spin_lock(&d->event_lock);
> +    read_lock(&d->event_lock);
>      hvm_irq_dpci = domain_get_irq_dpci(d);
>  
>      if ( !hvm_irq_dpci )
> @@ -1033,7 +1033,7 @@ void hvm_dpci_eoi(struct domain *d, unsi
>          __hvm_dpci_eoi(d, girq, ent);

__hvm_dpci_eoi will call hvm_pirq_eoi and that seems to require a
write lock, as it modifies pirq_dpci.

>  
>  unlock:
> -    spin_unlock(&d->event_lock);
> +    read_unlock(&d->event_lock);
>  }
>  
>  /*
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -883,7 +883,7 @@ static int pci_clean_dpci_irqs(struct do
>      if ( !is_hvm_domain(d) )
>          return 0;
>  
> -    spin_lock(&d->event_lock);
> +    write_lock(&d->event_lock);
>      hvm_irq_dpci = domain_get_irq_dpci(d);
>      if ( hvm_irq_dpci != NULL )
>      {
> @@ -901,14 +901,14 @@ static int pci_clean_dpci_irqs(struct do
>              ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
>          if ( ret )
>          {
> -            spin_unlock(&d->event_lock);
> +            read_unlock(&d->event_lock);

This should be a write_unlock AFAICT.

>              return ret;
>          }
>  
>          hvm_domain_irq(d)->dpci = NULL;
>          free_hvm_irq_dpci(hvm_irq_dpci);
>      }
> -    spin_unlock(&d->event_lock);
> +    write_unlock(&d->event_lock);
>      return 0;
>  }
>  
> --- a/xen/drivers/passthrough/vtd/x86/hvm.c
> +++ b/xen/drivers/passthrough/vtd/x86/hvm.c
> @@ -54,7 +54,7 @@ void hvm_dpci_isairq_eoi(struct domain *
>      if ( !is_iommu_enabled(d) )
>          return;
>  
> -    spin_lock(&d->event_lock);
> +    read_lock(&d->event_lock);

I think this also needs to be a write lock, as you modify pirq_dpci
bits in _hvm_dpci_isairq_eoi.

>  
>      dpci = domain_get_irq_dpci(d);
>  
> @@ -63,5 +63,5 @@ void hvm_dpci_isairq_eoi(struct domain *
>          /* Multiple mirq may be mapped to one isa irq */
>          pt_pirq_iterate(d, _hvm_dpci_isairq_eoi, (void *)(long)isairq);
>      }
> -    spin_unlock(&d->event_lock);
> +    read_unlock(&d->event_lock);
>  }
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -373,7 +373,7 @@ struct domain
>      unsigned int     xen_evtchns;
>      /* Port to resume from in evtchn_reset(), when in a continuation. */
>      unsigned int     next_evtchn;
> -    spinlock_t       event_lock;
> +    rwlock_t         event_lock;

It would be nice to add a comment regarding what fields does
event_lock protect. It's kind of a very generic lock name that I think
has been abused a bit.

Not that it needs to be done in that patch.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.