|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/pvcalls: fix potential endless loop in pvcalls-front.c
On 11/15/2017 02:09 PM, Stefano Stabellini wrote:
> On Wed, 15 Nov 2017, Juergen Gross wrote:
>>>>> while(mutex_is_locked(&map->active.in_mutex.owner) ||
>>>>> mutex_is_locked(&map->active.out_mutex.owner))
>>>>> cpu_relax();
>>>>>
>>>>> ?
>>>> I'm not convinced there isn't a race.
>>>>
>>>> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only
>>>> then in_mutex is taken. What happens if pvcalls_front_release() resets
>>>> sk_send_head and manages to test the mutex before the mutex is locked?
>>>>
>>>> Even in case this is impossible: the whole construct seems to be rather
>>>> fragile.
> I agree it looks fragile, and I agree that it might be best to avoid the
> usage of in_mutex and out_mutex as refcounts. More comments on this
> below.
>
>
>>> I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and
>>> not rely on mutex state.
>> Yes, this would work.
> Yes, I agree it would work and for the sake of getting something in
> shape for the merge window I am attaching a patch for it. Please go
> ahead with it. Let me know if you need anything else immediately, and
> I'll work on it ASAP.
>
>
>
> However, I should note that this is a pretty big hammer we are using:
> the refcount is global, while we only need to wait until it's only us
> _on this specific socket_.
Can you explain why socket is important?
>
> We really need a per socket refcount. If we don't want to use the mutex
> internal counters, then we need another one.
>
> See the appended patch that introduces a per socket refcount. However,
> for the merge window, also using pvcalls_refcount is fine.
>
> The race Juergen is concerned about is only theoretically possible:
>
> recvmsg: release:
>
> test sk_send_head clear sk_send_head
> <only few instructions> <prepare a message>
> grab in_mutex <send a message to the server>
> <wait for an answer>
> test in_mutex
>
> Without kernel preemption is not possible for release to clear
> sk_send_head and test in_mutex after recvmsg tests sk_send_head and
> before recvmsg grabs in_mutex.
Sorry, I don't follow --- what does preemption have to do with this? If
recvmsg and release happen on different processors the order of
operations can be
CPU0 CPU1
test sk_send_head
<interrupt>
clear sk_send_head
<send a message to the server>
<wait for an answer>
test in_mutex
free everything
grab in_mutex
I actually think RCU should take care of all of this.
But for now I will take your refcount-based patch. However, it also
needs comment update.
How about
/*
* We need to make sure that send/rcvmsg on this socket has not started
* before we've cleared sk_send_head here. The easiest (though not optimal)
* way to guarantee this is to see that no pvcall (other than us) is in
progress.
*/
-boris
>
> But maybe we need to disable kernel preemption in recvmsg and sendmsg to
> stay on the safe side?
>
> The patch below introduces a per active socket refcount, so that we
> don't have to rely on in_mutex and out_mutex for refcounting. It also
> disables preemption in sendmsg and recvmsg in the region described
> above.
>
> I don't think this patch should go in immediately. We can take our time
> to figure out the best fix.
>
>
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index 0c1ec68..8c1030b 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -68,6 +68,7 @@ struct sock_mapping {
> struct pvcalls_data data;
> struct mutex in_mutex;
> struct mutex out_mutex;
> + atomic_t sock_refcount;
>
> wait_queue_head_t inflight_conn_req;
> } active;
> @@ -497,15 +498,20 @@ int pvcalls_front_sendmsg(struct socket *sock, struct
> msghdr *msg,
> }
> bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>
> + preempt_disable();
> map = (struct sock_mapping *) sock->sk->sk_send_head;
> if (!map) {
> + preempt_enable();
> pvcalls_exit();
> return -ENOTSOCK;
> }
>
> + atomic_inc(&map->active.sock_refcount);
> mutex_lock(&map->active.out_mutex);
> + preempt_enable();
> if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
> mutex_unlock(&map->active.out_mutex);
> + atomic_dec(&map->active.sock_refcount);
> pvcalls_exit();
> return -EAGAIN;
> }
> @@ -528,6 +534,7 @@ int pvcalls_front_sendmsg(struct socket *sock, struct
> msghdr *msg,
> tot_sent = sent;
>
> mutex_unlock(&map->active.out_mutex);
> + atomic_dec(&map->active.sock_refcount);
> pvcalls_exit();
> return tot_sent;
> }
> @@ -600,13 +607,17 @@ int pvcalls_front_recvmsg(struct socket *sock, struct
> msghdr *msg, size_t len,
> }
> bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>
> + preempt_disable();
> map = (struct sock_mapping *) sock->sk->sk_send_head;
> if (!map) {
> + preempt_enable();
> pvcalls_exit();
> return -ENOTSOCK;
> }
>
> + atomic_inc(&map->active.sock_refcount);
> mutex_lock(&map->active.in_mutex);
> + preempt_enable();
> if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
> len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
>
> @@ -625,6 +636,7 @@ int pvcalls_front_recvmsg(struct socket *sock, struct
> msghdr *msg, size_t len,
> ret = 0;
>
> mutex_unlock(&map->active.in_mutex);
> + atomic_dec(&map->active.sock_refcount);
> pvcalls_exit();
> return ret;
> }
> @@ -1048,8 +1060,7 @@ int pvcalls_front_release(struct socket *sock)
> * is set to NULL -- we only need to wait for the existing
> * waiters to return.
> */
> - while (!mutex_trylock(&map->active.in_mutex) ||
> - !mutex_trylock(&map->active.out_mutex))
> + while (atomic_read(&map->active.sock_refcount) > 0)
> cpu_relax();
>
> pvcalls_front_free_map(bedata, map);
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |