[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 01/10] evtchn: use per-channel lock where possible

On 08.01.2021 21:32, Julien Grall wrote:
> Hi Jan,
> On 05/01/2021 13:09, Jan Beulich wrote:
>> Neither evtchn_status() nor domain_dump_evtchn_info() nor
>> flask_get_peer_sid() need to hold the per-domain lock - they all only
>> read a single channel's state (at a time, in the dump case).
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> v4: New.
>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -968,15 +968,16 @@ int evtchn_status(evtchn_status_t *statu
>>       if ( d == NULL )
>>           return -ESRCH;
>> -    spin_lock(&d->event_lock);
>> -
>>       if ( !port_is_valid(d, port) )
> There is one issue that is now becoming more apparent. To be clear, the 
> problem is not in this patch, but I think it is the best place to 
> discuss it as d->event_lock may be part of the solution.
> After XSA-344, evtchn_destroy() will end up to decrement d->valid_evtchns.
> Given that evtchn_status() can work on the non-current domain, it would 
> be possible to run it concurrently with evtchn_destroy(). As a 
> consequence, port_is_valid() will be unstable as a valid event channel 
> may turn invalid.
> AFAICT, we are getting away so far, as the memory is not freed until the 
> domain is fully destroyed. However, we re-introduced XSA-338 in a 
> different way.
> To be clear this is not the fault of this patch. But I don't think this 
> is sane to re-introduce a behavior that lead us to an XSA.

I'm getting confused, I'm afraid, from the varying statements above:
Are you suggesting this patch does re-introduce bad behavior or not?

Yes, the decrementing of ->valid_evtchns has a similar effect, but
I'm not convinced it gets us into XSA territory again. The problem
wasn't the reducing of ->max_evtchns as such, but the derived
assumptions elsewhere in the code. If there were any such again, I
suppose we'd have reason to issue another XSA.

Furthermore there are other paths already using port_is_valid()
without holding the domain's event lock; I've not been able to spot
a problem with this though, so far.

> I can see two solutions:
>    1) Use d->event_lock to protect port_is_valid() when d != 
> current->domain. This would require evtchn_destroy() to grab the lock 
> when updating d->valid_evtchns.
>    2) Never decrement d->valid_evtchns and use a different field for 
> closing ports
> I am not a big fan of 1) because this is muddying the already complex 
> locking situation in the event channel code. But I suggested it because 
> I wasn't sure whether you would be happy with 2).

I agree 1) wouldn't be very nice, and you're right in assuming I
wouldn't like 2) very much. For the moment I'm not (yet) convinced
we need to do anything at all - as you say yourself, while the
result of port_is_valid() is potentially unstable when a domain is
in the process of being cleaned up, the state guarded by such
checks remains usable in (I think) a race free manner.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.