[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect



Copying in Tim who did the final phase of the changes.

On Mon, 24 Apr 2023 at 11:32, Paul Durrant <xadimgnik@xxxxxxxxx> wrote:
>
> On 20/04/2023 12:02, mark.syms@xxxxxxxxxx wrote:
> > From: Mark Syms <mark.syms@xxxxxxxxxx>
> >
> > Ensure the PV ring is drained on disconnect. Also ensure all pending
> > AIO is complete, otherwise AIO tries to complete into a mapping of the
> > ring which has been torn down.
> >
> > Signed-off-by: Mark Syms <mark.syms@xxxxxxxxxx>
> > ---
> > CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> > CC: Anthony Perard <anthony.perard@xxxxxxxxxx>
> > CC: Paul Durrant <paul@xxxxxxx>
> > CC: xen-devel@xxxxxxxxxxxxxxxxxxxx
> >
> > v2:
> >   * Ensure all inflight requests are completed before teardown
> >   * RESEND to fix formatting
> > ---
> >   hw/block/dataplane/xen-block.c | 31 +++++++++++++++++++++++++------
> >   1 file changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
> > index 734da42ea7..d9da4090bf 100644
> > --- a/hw/block/dataplane/xen-block.c
> > +++ b/hw/block/dataplane/xen-block.c
> > @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
> >  
> >       dataplane->more_work = 0;
> >  
> > +    if (dataplane->sring == 0) {
> > +        return done_something;
> > +    }
> > +
>
> I think you could just return false here... Nothing is ever going to be
> done if there's no ring :-)
>
> >       rc = dataplane->rings.common.req_cons;
> >       rp = dataplane->rings.common.sring->req_prod;
> >       xen_rmb(); /* Ensure we see queued requests up to 'rp'. */
> > @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane >   void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
> >   {
> >       XenDevice *xendev;
> > +    XenBlockRequest *request, *next;
> >  
> >       if (!dataplane) {
> >           return;
> >       }
> >  
> > +    /* We're about to drain the ring. We can cancel the scheduling of any
> > +     * bottom half now */
> > +    qemu_bh_cancel(dataplane->bh);
> > +
> > +    /* Ensure we have drained the ring */
> > +    aio_context_acquire(dataplane->ctx);
> > +    do {
> > +        xen_block_handle_requests(dataplane);
> > +    } while (dataplane->more_work);
> > +    aio_context_release(dataplane->ctx);
> > +
>
> I don't think we want to be taking new requests, do we?
>
> > +    /* Now ensure that all inflight requests are complete */
> > +    while (!QLIST_EMPTY(&dataplane->inflight)) {
> > +        QLIST_FOREACH_SAFE(request, &dataplane->inflight, list, next) {
> > +            blk_aio_flush(request->dataplane->blk, xen_block_complete_aio,
> > +                        request);
> > +        }
> > +    }
> > +
>
> I think this could possibly be simplified by doing the drain after the
> call to blk_set_aio_context(), as long as we set dataplane->ctx to
> qemu_get_aio_context(). Alos, as long as more_work is not set then it
> should still be safe to cancel the bh before the drain AFAICT.
>
>    Paul
>
> >       xendev = dataplane->xendev;
> >  
> >       aio_context_acquire(dataplane->ctx);
> > +
> >       if (dataplane->event_channel) {
> >           /* Only reason for failure is a NULL channel */
> >           xen_device_set_event_channel_context(xendev, dataplane->event_channel,
> > @@ -684,12 +709,6 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
> >       blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort);
> >       aio_context_release(dataplane->ctx);
> >  
> > -    /*
> > -     * Now that the context has been moved onto the main thread, cancel
> > -     * further processing.
> > -     */
> > -    qemu_bh_cancel(dataplane->bh);
> > -
> >       if (dataplane->event_channel) {
> >           Error *local_err = NULL;
> >  
>

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.