[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()



On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:
> Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> > While processing the parents of a BDS, one of the parents may process
> > the child that's doing the tail recursion, which leads to a BDS being
> > processed twice. This is especially problematic for the aio_notifiers,
> > as they might attempt to work on both the old and the new AIO
> > contexts.
> > 
> > To avoid this, add the BDS pointer to the ignore list, and check the
> > child BDS pointer while iterating over the children.
> > 
> > Signed-off-by: Sergio Lopez <slp@xxxxxxxxxx>
> 
> Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/

I know, it's effective but quite ugly...

> What is the specific scenario where you saw this breaking? Did you have
> multiple BdrvChild connections between two nodes so that we would go to
> the parent node through one and then come back to the child node through
> the other?

I don't think this is a corner case. If the graph is walked top->down,
there's no problem since children are added to the ignore list before
getting processed, and siblings don't process each other. But, if the
graph is walked bottom->up, a BDS will start processing its parents
without adding itself to the ignore list, so there's nothing
preventing them from processing it again.

I'm pasting here an annotated trace of bdrv_set_aio_context_ignore I
generated while triggering the issue:

<----- begin ------>
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 enter
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 processing children
bdrv_set_aio_context_ignore: bs=0x555ee2e5d420 enter
bdrv_set_aio_context_ignore: bs=0x555ee2e5d420 processing children
bdrv_set_aio_context_ignore: bs=0x555ee2e52060 enter
bdrv_set_aio_context_ignore: bs=0x555ee2e52060 processing children
bdrv_set_aio_context_ignore: bs=0x555ee2e52060 processing parents
bdrv_set_aio_context_ignore: bs=0x555ee2e52060 processing itself
bdrv_set_aio_context_ignore: bs=0x555ee2e5d420 processing parents

 - We enter b_s_a_c_i with BDS 2fbf660 the first time:
 
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 enter
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing children

 - We enter b_s_a_c_i with BDS 3bc0c00, a child of 2fbf660:
 
bdrv_set_aio_context_ignore: bs=0x555ee3bc0c00 enter
bdrv_set_aio_context_ignore: bs=0x555ee3bc0c00 processing children
bdrv_set_aio_context_ignore: bs=0x555ee3bc0c00 processing parents
bdrv_set_aio_context_ignore: bs=0x555ee3bc0c00 processing itself

 - We start processing its parents:
 
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing parents

 - We enter b_s_a_c_i with BDS 2e48030, a parent of 2fbf660:
 
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 enter
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 processing children

 - We enter b_s_a_c_i with BDS 2fbf660 again, because of parent
   2e48030 didn't found us it in the ignore list:
   
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 enter
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing children
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing parents
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing itself
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 processing parents
bdrv_set_aio_context_ignore: bs=0x555ee2e48030 processing itself

 - BDS 2fbf660 will be processed here a second time, triggering the
   issue:
   
bdrv_set_aio_context_ignore: bs=0x555ee2fbf660 processing itself
<----- end ------>

I suspect this has been happening for a while, and has only surfaced
now due to the need to run an AIO context BH in an aio_notifier
function that the "nbd/server: Quiesce coroutines on context switch"
patch introduces. There the problem is that the first time the
aio_notifier AIO detach function is called, it works on the old
context (as it should be), and the second one works on the new context
(which is wrong).

> Maybe if what we really need to do is not processing every edge once,
> but processing every node once, the list should be changed to contain
> _only_ BDS objects. But then blk_do_set_aio_context() probably won't
> work any more because it can't have blk->root ignored any more...

I tried that in my first attempt and it broke badly. I didn't take a
deeper look at the causes.

> Anyway, if we end up changing what the list contains, the comment needs
> an update, too. Currently it says:
> 
>  * @ignore will accumulate all visited BdrvChild object. The caller is
>  * responsible for freeing the list afterwards.
> 
> Another option: Split the parents QLIST_FOREACH loop in two. First add
> all parent BdrvChild objects to the ignore list, remember which of them
> were newly added, and only after adding all of them call
> child->klass->set_aio_ctx() for each parent that was previously not on
> the ignore list. This will avoid that we come back to the same node
> because all of its incoming edges are ignored now.

I don't think this strategy will fix the issue illustrated in the
trace above, as the BdrvChild pointer of the BDS processing its
parents won't be the on ignore list by the time one of its parents
starts processing its own children.

Thanks,
Sergio.

> >  block.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block.c b/block.c
> > index f1cedac362..bc8a66ab6e 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -6465,12 +6465,17 @@ void bdrv_set_aio_context_ignore(BlockDriverState 
> > *bs,
> >      bdrv_drained_begin(bs);
> >  
> >      QLIST_FOREACH(child, &bs->children, next) {
> > -        if (g_slist_find(*ignore, child)) {
> > +        if (g_slist_find(*ignore, child) || g_slist_find(*ignore, 
> > child->bs)) {
> >              continue;
> >          }
> >          *ignore = g_slist_prepend(*ignore, child);
> >          bdrv_set_aio_context_ignore(child->bs, new_context, ignore);
> >      }
> > +    /*
> > +     * Add a reference to this BS to the ignore list, so its
> > +     * parents won't attempt to process it again.
> > +     */
> > +    *ignore = g_slist_prepend(*ignore, bs);
> >      QLIST_FOREACH(child, &bs->parents, next_parent) {
> >          if (g_slist_find(*ignore, child)) {
> >              continue;
> > -- 
> > 2.26.2
> > 
> 

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.