Xen project Mailing List

Re: [PATCH v6 2/3] xen/evtchn: rework per event channel lock

From: Julien Grall <julien@xxxxxxx>

Date: Tue, 10 Nov 2020 09:47:32 +0000

Cc: Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>

Delivery-date: Tue, 10 Nov 2020 09:47:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi, On 10/11/2020 07:39, Jan Beulich wrote:

On 09.11.2020 18:46, Julien Grall wrote:

Hi,

On 09/11/2020 16:48, Jan Beulich wrote:

On 09.11.2020 17:38, Juergen Gross wrote:

Currently the lock for a single event channel needs to be taken with
interrupts off, which causes deadlocks in some cases.

Rework the per event channel lock to be non-blocking for the case of
sending an event and removing the need for disabling interrupts for
taking the lock.

The lock is needed for avoiding races between event channel state
changes (creation, closing, binding) against normal operations (set
pending, [un]masking, priority changes).

Use a rwlock, but with some restrictions:

- Changing the state of an event channel (creation, closing, binding)
    needs to use write_lock(), with ASSERT()ing that the lock is taken as
    writer only when the state of the event channel is either before or
    after the locked region appropriate (either free or unbound).

- Sending an event needs to use read_trylock() mostly, in case of not
    obtaining the lock the operation is omitted. This is needed as
    sending an event can happen with interrupts off (at least in some
    cases).

- Dumping the event channel state for debug purposes is using
    read_trylock(), too, in order to avoid blocking in case the lock is
    taken as writer for a long time.

- All other cases can use read_lock().

Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
---
V4:
- switch to rwlock
- add ASSERT() to verify correct write_lock() usage

V3:
- corrected a copy-and-paste error (Jan Beulich)
- corrected unlocking in two cases (Jan Beulich)
- renamed evtchn_read_trylock() (Jan Beulich)
- added some comments and an ASSERT() for evtchn_write_lock()
- set EVENT_WRITE_LOCK_INC to INT_MIN

V2:
- added needed barriers

Signed-off-by: Juergen Gross <jgross@xxxxxxxx>


Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

I'll give it overnight for others to possibly comment, but I'd
like to get this in tomorrow if at all possible.


IIRC, Citrix originally reported the issues. I would like to give them
an opportunity to test the patches and confirm this effectively fixed
there issues.


I would consider waiting longer, but I'd like to get staging unblocked.

So the plan is to have the patches sitting in staging for a few days and

then backport?

Just like with 52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on
send path") we can always decide to revert / fix up afterwards. The
patch here surely is an improvement, even if it was to turn out it
doesn't fix all issues yes. I'd appreciate if you would confirm you
don't object to the patch going in - I'll wait a few more hours,
perhaps until around noon.

I would suggest to wait until noon and if you don't get any answer, then

merge it.

Cheers, -- Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.