|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 12/12] evtchn: convert domain event lock to an r/w one
On 28.09.2020 18:44, Roger Pau Monné wrote:
> On Mon, Sep 28, 2020 at 01:02:43PM +0200, Jan Beulich wrote:
>> Especially for the use in evtchn_move_pirqs() (called when moving a vCPU
>> across pCPU-s) and the ones in EOI handling in PCI pass-through code,
>> serializing perhaps an entire domain isn't helpful when no state (which
>> isn't e.g. further protected by the per-channel lock) changes.
>>
>> Unfortunately this implies dropping of lock profiling for this lock,
>> until r/w locks may get enabled for such functionality.
>>
>> While ->notify_vcpu_id is now meant to be consistently updated with the
>> per-channel lock held for writing, an extension applies to ECS_PIRQ: The
>> field is also guaranteed to not change with the per-domain event lock
>> held. Therefore the unlink_pirq_port() call from evtchn_bind_vcpu() as
>> well as the link_pirq_port() one from evtchn_bind_pirq() could in
>> principle be moved out of the per-channel locked regions, but this
>> further code churn didn't seem worth it.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> RFC:
>> * In evtchn_bind_vcpu() the question is whether limiting the use of
>> write_lock() to just the ECS_PIRQ case is really worth it.
>
> IMO I would just use use write_lock() at the top of the function in
> place of the current spin_lock. The more fine grained change should be
> done as a follow up patch if it's worth it. TBH event channels
> shouldn't change vCPU that frequently that using a more fine grained
> approach matters much.
No, using the write lock in place of the spin lock would violate
what the description says: "While ->notify_vcpu_id is now meant to
be consistently updated with the per-channel lock held for writing,
...". I could only switch to acquiring both.
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -465,7 +465,7 @@ int msixtbl_pt_register(struct domain *d
>> int r = -EINVAL;
>>
>> ASSERT(pcidevs_locked());
>> - ASSERT(spin_is_locked(&d->event_lock));
>> + ASSERT(rw_is_write_locked(&d->event_lock));
>
> FWIW, we could switch rw_is_write_locked to use
> _is_write_locked_by_me (or introduce rw_is_write_locked_by_me, albeit
> I think all users of rw_is_write_locked care about the lock being
> taken by them).
Probably, but not here and now.
>> @@ -1098,7 +1108,7 @@ int evtchn_reset(struct domain *d, bool
>> if ( d != current->domain && !d->controller_pause_count )
>> return -EINVAL;
>>
>> - spin_lock(&d->event_lock);
>> + read_lock(&d->event_lock);
>>
>> /*
>> * If we are resuming, then start where we stopped. Otherwise, check
>> @@ -1109,7 +1119,7 @@ int evtchn_reset(struct domain *d, bool
>> if ( i > d->next_evtchn )
>> d->next_evtchn = i;
>
> Using the read lock to write to d->next_evtchn here...
>
>>
>> - spin_unlock(&d->event_lock);
>> + read_unlock(&d->event_lock);
>>
>> if ( !i )
>> return -EBUSY;
>> @@ -1121,14 +1131,14 @@ int evtchn_reset(struct domain *d, bool
>> /* NB: Choice of frequency is arbitrary. */
>> if ( !(i & 0x3f) && hypercall_preempt_check() )
>> {
>> - spin_lock(&d->event_lock);
>> + write_lock(&d->event_lock);
>> d->next_evtchn = i;
>
> ... but the write lock here instead seems inconsistent.
Agreed, fixed.
>> @@ -1557,7 +1568,7 @@ static void domain_dump_evtchn_info(stru
>> "Polling vCPUs: {%*pbl}\n"
>> " port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
>>
>> - spin_lock(&d->event_lock);
>> + read_lock(&d->event_lock);
>
> Since this is a debug key, I would suggest using read_trylock in order
> to prevent blocking if a CPU is stuck while holding the event_lock in
> write mode.
We should do so in all debug key handlers imo, and I don't think
such a mostly orthogonal change would be reasonable to make
right here.
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -105,7 +105,7 @@ static void pt_pirq_softirq_reset(struct
>> {
>> struct domain *d = pirq_dpci->dom;
>>
>> - ASSERT(spin_is_locked(&d->event_lock));
>> + ASSERT(rw_is_write_locked(&d->event_lock));
>>
>> switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
>> {
>> @@ -162,7 +162,7 @@ static void pt_irq_time_out(void *data)
>> const struct hvm_irq_dpci *dpci;
>> const struct dev_intx_gsi_link *digl;
>>
>> - spin_lock(&irq_map->dom->event_lock);
>> + read_lock(&irq_map->dom->event_lock);
>
> Is it fine to use the lock in read mode here? It's likely to change
> the flags by adding HVM_IRQ_DPCI_EOI_LATCH, and hence should use the
> lock in write mode?
>
> As I think that's the lock that's supposed to protect changes to the
> flags field?
Yes, looks like you're right - there simply is no other lock.
You're also right for all other respective pass-through related
remarks, I've switched them all to write_lock(). That's a bit
of a shame though, as especially on the EOI path things ought
to be able to work in a more parallel way.
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -883,7 +883,7 @@ static int pci_clean_dpci_irqs(struct do
>> if ( !is_hvm_domain(d) )
>> return 0;
>>
>> - spin_lock(&d->event_lock);
>> + write_lock(&d->event_lock);
>> hvm_irq_dpci = domain_get_irq_dpci(d);
>> if ( hvm_irq_dpci != NULL )
>> {
>> @@ -901,14 +901,14 @@ static int pci_clean_dpci_irqs(struct do
>> ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
>> if ( ret )
>> {
>> - spin_unlock(&d->event_lock);
>> + read_unlock(&d->event_lock);
>
> This should be a write_unlock AFAICT.
Oh, of course - missed to convert this one when converting the two
others.
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -373,7 +373,7 @@ struct domain
>> unsigned int xen_evtchns;
>> /* Port to resume from in evtchn_reset(), when in a continuation. */
>> unsigned int next_evtchn;
>> - spinlock_t event_lock;
>> + rwlock_t event_lock;
>
> It would be nice to add a comment regarding what fields does
> event_lock protect. It's kind of a very generic lock name that I think
> has been abused a bit.
>
> Not that it needs to be done in that patch.
Right, and it'll be quite some auditing that needs doing in order
to collect all the pieces.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |