|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 4/5] evtchn: convert domain event lock to an r/w one
On 23.12.2020 12:22, Julien Grall wrote:
> Hi Jan,
>
> On 22/12/2020 09:46, Jan Beulich wrote:
>> On 21.12.2020 18:45, Julien Grall wrote:
>>> On 14/12/2020 09:40, Jan Beulich wrote:
>>>> On 11.12.2020 11:57, Julien Grall wrote:
>>>>> On 11/12/2020 10:32, Jan Beulich wrote:
>>>>>> On 09.12.2020 12:54, Julien Grall wrote:
>>>>>>> On 23/11/2020 13:29, Jan Beulich wrote:
>>>>>>>> @@ -620,7 +620,7 @@ int evtchn_close(struct domain *d1, int
>>>>>>>> long rc = 0;
>>>>>>>>
>>>>>>>> again:
>>>>>>>> - spin_lock(&d1->event_lock);
>>>>>>>> + write_lock(&d1->event_lock);
>>>>>>>>
>>>>>>>> if ( !port_is_valid(d1, port1) )
>>>>>>>> {
>>>>>>>> @@ -690,13 +690,11 @@ int evtchn_close(struct domain *d1, int
>>>>>>>> BUG();
>>>>>>>>
>>>>>>>> if ( d1 < d2 )
>>>>>>>> - {
>>>>>>>> - spin_lock(&d2->event_lock);
>>>>>>>> - }
>>>>>>>> + read_lock(&d2->event_lock);
>>>>>>>
>>>>>>> This change made me realized that I don't quite understand how the
>>>>>>> rwlock is meant to work for event_lock. I was actually expecting this to
>>>>>>> be a write_lock() given there are state changed in the d2 events.
>>>>>>
>>>>>> Well, the protection needs to be against racing changes, i.e.
>>>>>> parallel invocations of this same function, or evtchn_close().
>>>>>> It is debatable whether evtchn_status() and
>>>>>> domain_dump_evtchn_info() would better also be locked out
>>>>>> (other read_lock() uses aren't applicable to interdomain
>>>>>> channels).
>>>>>>
>>>>>>> Could you outline how a developper can find out whether he/she should
>>>>>>> use read_lock or write_lock?
>>>>>>
>>>>>> I could try to, but it would again be a port type dependent
>>>>>> model, just like for the per-channel locks.
>>>>>
>>>>> It is quite important to have clear locking strategy (in particular
>>>>> rwlock) so we can make correct decision when to use read_lock or
>>>>> write_lock.
>>>>>
>>>>>> So I'd like it to
>>>>>> be clarified first whether you aren't instead indirectly
>>>>>> asking for these to become write_lock()
>>>>>
>>>>> Well, I don't understand why this is a read_lock() (even with your
>>>>> previous explanation). I am not suggesting to switch to a write_lock(),
>>>>> but instead asking for the reasoning behind the decision.
>>>>
>>>> So if what I've said in my previous reply isn't enough (including the
>>>> argument towards using two write_lock() here), I'm struggling to
>>>> figure what else to say. The primary goal is to exclude changes to
>>>> the same ports. For this it is sufficient to hold just one of the two
>>>> locks in writer mode, as the other (racing) one will acquire that
>>>> same lock for at least reading. The question whether both need to use
>>>> writer mode can only be decided when looking at the sites acquiring
>>>> just one of the locks in reader mode (hence the reference to
>>>> evtchn_status() and domain_dump_evtchn_info()) - if races with them
>>>> are deemed to be a problem, switching to both-writers will be needed.
>>>
>>> I had another look at the code based on your explanation. I don't think
>>> it is fine to allow evtchn_status() to be concurrently called with
>>> evtchn_close().
>>>
>>> evtchn_close() contains the following code:
>>>
>>> chn2->state = ECS_UNBOUND;
>>> chn2->u.unbound.remote_domid = d1->domain_id;
>>>
>>> Where chn2 is a event channel of the remote domain (d2). Your patch will
>>> only held the read lock for d2.
>>>
>>> However evtchn_status() expects the event channel state to not change
>>> behind its back. This assumption doesn't hold for d2, and you could
>>> possibly end up to see the new value of chn2->state after the new
>>> chn2->u.unbound.remote_domid.
>>>
>>> Thanksfully, it doesn't look like chn2->u.interdomain.remote_domain
>>> would be overwritten. Otherwise, this would be a straight dereference of
>>> an invalid pointer.
>>>
>>> So I think, we need to held the write event lock for both domain.
>>
>> Well, okay. Three considerations though:
>>
>> 1) Neither evtchn_status() nor domain_dump_evtchn_info() appear to
>> have a real need to acquire the per-domain lock. They could as well
>> acquire the per-channel ones. (In the latter case this will then
>> also allow inserting the so far missing process_pending_softirqs()
>> call; it shouldn't be made with a lock held.)
> I agree that evtchn_status() doesn't need to acquire the per-domain
> lock. I am not entirely sure about domain_dump_evtchn_info() because
> AFAICT the PIRQ tree (used by domain_pirq_to_irq()) is protected with
> d->event_lock.
It is, but calling it without the lock just to display the IRQ
is not a problem afaict.
>> 3) With the per-channel double locking and with 1) addressed I
>> can't really see the need for the double per-domain locking in
>> evtchn_bind_interdomain() and evtchn_close(). The write lock is
>> needed for the domain allocating a new port or freeing one. But why
>> is there any need for holding the remote domain's lock, when its
>> side of the channel gets guarded by the per-channel lock anyway?
>
> If 1) is addressed, then I think it should be fine to just acquire the
> read event lock of the remote domain.
For bind-interdomain I've eliminated the double locking, so the
question goes away there altogether. While for close I thought
I had managed to eliminate it too, the change looks to be
causing a deadlock of some sort, which I'll have to figure out.
However, the change might be controversial anyway, because I
need to play games already prior to fixing that bug ...
All of this said - for the time being it'll be both write_lock()
in evtchn_close(), as I consider it risky to make the remote one
a read_lock() merely based on the observation that there is
currently (i.e. with 1) addressed) no conflict.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |