Xen project Mailing List

Re: [Xen-devel] [PATCH v6 1/5] x86/mem_sharing: reorder when pages are unlocked and released

From: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Date: Thu, 18 Jul 2019 07:47:52 -0600

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 18 Jul 2019 13:48:33 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jul 18, 2019 at 7:33 AM Jan Beulich <JBeulich@xxxxxxxx> wrote: > > On 18.07.2019 15:13, Tamas K Lengyel wrote: > > On Thu, Jul 18, 2019 at 7:12 AM Jan Beulich <JBeulich@xxxxxxxx> wrote: > >> > >> On 18.07.2019 14:55, Tamas K Lengyel wrote: > >>> On Thu, Jul 18, 2019 at 4:47 AM Jan Beulich <JBeulich@xxxxxxxx> wrote: > >>>> On 17.07.2019 21:33, Tamas K Lengyel wrote: > >>>>> @@ -900,6 +895,7 @@ static int share_pages(struct domain *sd, gfn_t > >>>>> sgfn, shr_handle_t sh, > >>>>> p2m_type_t smfn_type, cmfn_type; > >>>>> struct two_gfns tg; > >>>>> struct rmap_iterator ri; > >>>>> + unsigned long put_count = 0; > >>>>> > >>>>> get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn, > >>>>> cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg); > >>>>> @@ -964,15 +960,6 @@ static int share_pages(struct domain *sd, gfn_t > >>>>> sgfn, shr_handle_t sh, > >>>>> goto err_out; > >>>>> } > >>>>> > >>>>> - /* Acquire an extra reference, for the freeing below to be safe. */ > >>>>> - if ( !get_page(cpage, dom_cow) ) > >>>>> - { > >>>>> - ret = -EOVERFLOW; > >>>>> - mem_sharing_page_unlock(secondpg); > >>>>> - mem_sharing_page_unlock(firstpg); > >>>>> - goto err_out; > >>>>> - } > >>>>> - > >>>>> /* Merge the lists together */ > >>>>> rmap_seed_iterator(cpage, &ri); > >>>>> while ( (gfn = rmap_iterate(cpage, &ri)) != NULL) > >>>>> @@ -984,13 +971,14 @@ static int share_pages(struct domain *sd, gfn_t > >>>>> sgfn, shr_handle_t sh, > >>>>> * Don't change the type of rmap for the client page. */ > >>>>> rmap_del(gfn, cpage, 0); > >>>>> rmap_add(gfn, spage); > >>>>> - put_page_and_type(cpage); > >>>>> + put_count++; > >>>>> d = get_domain_by_id(gfn->domain); > >>>>> BUG_ON(!d); > >>>>> BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn)); > >>>>> put_domain(d); > >>>>> } > >>>>> ASSERT(list_empty(&cpage->sharing->gfns)); > >>>>> + BUG_ON(!put_count); > >>>>> > >>>>> /* Clear the rest of the shared state */ > >>>>> page_sharing_dispose(cpage); > >>>>> @@ -1001,7 +989,9 @@ static int share_pages(struct domain *sd, gfn_t > >>>>> sgfn, shr_handle_t sh, > >>>>> > >>>>> /* Free the client page */ > >>>>> put_page_alloc_ref(cpage); > >>>>> - put_page(cpage); > >>>>> + > >>>>> + while ( put_count-- ) > >>>>> + put_page_and_type(cpage); > >>>>> > >>>>> /* We managed to free a domain page. */ > >>>>> atomic_dec(&nr_shared_mfns); > >>>>> @@ -1165,19 +1155,13 @@ int __mem_sharing_unshare_page(struct domain *d, > >>>>> { > >>>>> if ( !last_gfn ) > >>>>> mem_sharing_gfn_destroy(page, d, gfn_info); > >>>>> - put_page_and_type(page); > >>>>> + > >>>>> mem_sharing_page_unlock(page); > >>>>> + > >>>>> if ( last_gfn ) > >>>>> - { > >>>>> - if ( !get_page(page, dom_cow) ) > >>>>> - { > >>>>> - put_gfn(d, gfn); > >>>>> - domain_crash(d); > >>>>> - return -EOVERFLOW; > >>>>> - } > >>>>> put_page_alloc_ref(page); > >>>>> - put_page(page); > >>>>> - } > >>>>> + > >>>>> + put_page_and_type(page); > >>>>> put_gfn(d, gfn); > >>>>> > >>>>> return 0; > >>>> > >>>> ... this (main, as I guess by the title) part of the change? I think > >>>> you want to explain what was wrong here and/or why the new arrangement > >>>> is better. (I'm sorry, I guess it was this way on prior versions > >>>> already, but apparently I didn't notice.) > >>> > >>> It's what the patch message says - calling put_page_and_type before > >>> mem_sharing_page_unlock can cause a deadlock. Since now we are now > >>> holding a reference to the page till the end there is no need for the > >>> extra get_page/put_page logic when we are dealing with the last_gfn. > >> > >> The title says "reorder" without any "why". > > > > Yes, I can't reasonably fit "Calling _put_page_type while also holding > > the page_lock for that page can cause a deadlock." into the title. So > > it's spelled out in the patch message. > > Of course not. And I realize _part_ of the changes is indeed what the > title says (although for share_pages() that's not obvious from the > patch alone). But you do more: You also avoid acquiring an extra > reference in share_pages(). I feel like we are going in circles and having the same conversations over and over without really making any headway. You introduced grabbing the broken extra reference in 0502e0adae2. It is and was actually unneeded to start with if the proper solution was put in place, which is what this patch does, reordering things. > And since you made me look at the code again: If put_page() is unsafe > with a lock held, how come the get_page_and_type() in share_pages() > is safe with two such locks held? If it really is, it surely would be > worthwhile to state in the description. There's another such instance > in mem_sharing_add_to_physmap() (plus a get_page()), and also one > where put_page_and_type() gets called with such a lock held (afaics). > It's possible there are other instances where this may still be broken. Right now I only have bandwidth to test and fix the paths I use. If that's unacceptable I'm happy to continue development in my private fork and leave things as-is upstream. Tamas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.