[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/12] evtchn: convert domain event lock to an r/w one



On 28.09.2020 18:44, Roger Pau Monné wrote:
> On Mon, Sep 28, 2020 at 01:02:43PM +0200, Jan Beulich wrote:
>> Especially for the use in evtchn_move_pirqs() (called when moving a vCPU
>> across pCPU-s) and the ones in EOI handling in PCI pass-through code,
>> serializing perhaps an entire domain isn't helpful when no state (which
>> isn't e.g. further protected by the per-channel lock) changes.
>>
>> Unfortunately this implies dropping of lock profiling for this lock,
>> until r/w locks may get enabled for such functionality.
>>
>> While ->notify_vcpu_id is now meant to be consistently updated with the
>> per-channel lock held for writing, an extension applies to ECS_PIRQ: The
>> field is also guaranteed to not change with the per-domain event lock
>> held. Therefore the unlink_pirq_port() call from evtchn_bind_vcpu() as
>> well as the link_pirq_port() one from evtchn_bind_pirq() could in
>> principle be moved out of the per-channel locked regions, but this
>> further code churn didn't seem worth it.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> RFC:
>> * In evtchn_bind_vcpu() the question is whether limiting the use of
>>   write_lock() to just the ECS_PIRQ case is really worth it.
> 
> IMO I would just use use write_lock() at the top of the function in
> place of the current spin_lock. The more fine grained change should be
> done as a follow up patch if it's worth it. TBH event channels
> shouldn't change vCPU that frequently that using a more fine grained
> approach matters much.

No, using the write lock in place of the spin lock would violate
what the description says: "While ->notify_vcpu_id is now meant to
be consistently updated with the per-channel lock held for writing,
...". I could only switch to acquiring both.

>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -465,7 +465,7 @@ int msixtbl_pt_register(struct domain *d
>>      int r = -EINVAL;
>>  
>>      ASSERT(pcidevs_locked());
>> -    ASSERT(spin_is_locked(&d->event_lock));
>> +    ASSERT(rw_is_write_locked(&d->event_lock));
> 
> FWIW, we could switch rw_is_write_locked to use
> _is_write_locked_by_me (or introduce rw_is_write_locked_by_me, albeit
> I think all users of rw_is_write_locked care about the lock being
> taken by them).

Probably, but not here and now.

>> @@ -1098,7 +1108,7 @@ int evtchn_reset(struct domain *d, bool
>>      if ( d != current->domain && !d->controller_pause_count )
>>          return -EINVAL;
>>  
>> -    spin_lock(&d->event_lock);
>> +    read_lock(&d->event_lock);
>>  
>>      /*
>>       * If we are resuming, then start where we stopped. Otherwise, check
>> @@ -1109,7 +1119,7 @@ int evtchn_reset(struct domain *d, bool
>>      if ( i > d->next_evtchn )
>>          d->next_evtchn = i;
> 
> Using the read lock to write to d->next_evtchn here...
> 
>>  
>> -    spin_unlock(&d->event_lock);
>> +    read_unlock(&d->event_lock);
>>  
>>      if ( !i )
>>          return -EBUSY;
>> @@ -1121,14 +1131,14 @@ int evtchn_reset(struct domain *d, bool
>>          /* NB: Choice of frequency is arbitrary. */
>>          if ( !(i & 0x3f) && hypercall_preempt_check() )
>>          {
>> -            spin_lock(&d->event_lock);
>> +            write_lock(&d->event_lock);
>>              d->next_evtchn = i;
> 
> ... but the write lock here instead seems inconsistent.

Agreed, fixed.

>> @@ -1557,7 +1568,7 @@ static void domain_dump_evtchn_info(stru
>>             "Polling vCPUs: {%*pbl}\n"
>>             "    port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
>>  
>> -    spin_lock(&d->event_lock);
>> +    read_lock(&d->event_lock);
> 
> Since this is a debug key, I would suggest using read_trylock in order
> to prevent blocking if a CPU is stuck while holding the event_lock in
> write mode.

We should do so in all debug key handlers imo, and I don't think
such a mostly orthogonal change would be reasonable to make
right here.

>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -105,7 +105,7 @@ static void pt_pirq_softirq_reset(struct
>>  {
>>      struct domain *d = pirq_dpci->dom;
>>  
>> -    ASSERT(spin_is_locked(&d->event_lock));
>> +    ASSERT(rw_is_write_locked(&d->event_lock));
>>  
>>      switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
>>      {
>> @@ -162,7 +162,7 @@ static void pt_irq_time_out(void *data)
>>      const struct hvm_irq_dpci *dpci;
>>      const struct dev_intx_gsi_link *digl;
>>  
>> -    spin_lock(&irq_map->dom->event_lock);
>> +    read_lock(&irq_map->dom->event_lock);
> 
> Is it fine to use the lock in read mode here? It's likely to change
> the flags by adding HVM_IRQ_DPCI_EOI_LATCH, and hence should use the
> lock in write mode?
> 
> As I think that's the lock that's supposed to protect changes to the
> flags field?

Yes, looks like you're right - there simply is no other lock.
You're also right for all other respective pass-through related
remarks, I've switched them all to write_lock(). That's a bit
of a shame though, as especially on the EOI path things ought
to be able to work in a more parallel way.

>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -883,7 +883,7 @@ static int pci_clean_dpci_irqs(struct do
>>      if ( !is_hvm_domain(d) )
>>          return 0;
>>  
>> -    spin_lock(&d->event_lock);
>> +    write_lock(&d->event_lock);
>>      hvm_irq_dpci = domain_get_irq_dpci(d);
>>      if ( hvm_irq_dpci != NULL )
>>      {
>> @@ -901,14 +901,14 @@ static int pci_clean_dpci_irqs(struct do
>>              ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
>>          if ( ret )
>>          {
>> -            spin_unlock(&d->event_lock);
>> +            read_unlock(&d->event_lock);
> 
> This should be a write_unlock AFAICT.

Oh, of course - missed to convert this one when converting the two
others.

>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -373,7 +373,7 @@ struct domain
>>      unsigned int     xen_evtchns;
>>      /* Port to resume from in evtchn_reset(), when in a continuation. */
>>      unsigned int     next_evtchn;
>> -    spinlock_t       event_lock;
>> +    rwlock_t         event_lock;
> 
> It would be nice to add a comment regarding what fields does
> event_lock protect. It's kind of a very generic lock name that I think
> has been abused a bit.
> 
> Not that it needs to be done in that patch.

Right, and it'll be quite some auditing that needs doing in order
to collect all the pieces.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.