[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH] mem_sharing: fix race condition of nominate and unshare


  • To: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
  • From: Jui-Hao Chiang <juihaochiang@xxxxxxxxx>
  • Date: Wed, 12 Jan 2011 18:03:57 +0800
  • Cc: tinnycloud <tinnycloud@xxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 12 Jan 2011 02:05:37 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=tTR90FqJpeud6NwwUUj4bSQ3WP7YkB2fMTnYF9z4OG+i05Dent6Zc1hZ42PUhcho1U h7Kgw69EW5nGAp5863+r02qYuGthPkTsFx5uaPuLjciq4CRYpYqqV/hWtOc9Hz/ABT9z YfUg5xSDphwWQMsnG4TN+N6Smn42Faasi0qnU=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, Tim:

On Mon, Jan 10, 2011 at 4:10 PM, Jui-Hao Chiang <juihaochiang@xxxxxxxxx> wrote:

>>
>> After this change, unshare() has a potential problem of deadlock for
>> shr_lock and p2m_lock with different locking order.
>> Assume two CPUs do the following
>> CPU1: hvm_hap_nested_page_fault() => unshare() => p2m_change_type()
>> (locking order: shr_lock, p2m_lock)
>> CPU2: p2m_teardown() => unshare() (locking order: p2m_lock, shr_lock)
>> When CPU1 grabs shr_lock and CPU2 grabs p2m_lock, they deadlock later.
>>
>>  1.       mem_sharing_unshare_page() has the routine  called from
>> gfn_to_mfn_unshare, which is called by gnttab_transfer
>>
>> Since no bug report on grant_table right now, so I think this is safe for
>> now
>>
>> Also  p2m_tear_down è mem_sharing_unshare_page() , its flag is
>> MEM_SHARING_DESTROY_GFN, and won’t has the chance to
>>
>> call set_shared_p2m_entry()
>>
>>
>
> Of course, the p2m_teardown won't call set_shared_p2m_entry. But this does
> not change my argument that p2m_teardown() hold p2m_lock to wait on
> shr_lock. Actaully, after looking for a while, I rebut myself that the
> scenario of deadlock won't exist.
> When p2m_teardown is called, the domain is dying in its last few steps
> (device, irq are released), and there is no way for
> hvm_hap_nested_page_fault() to happen on the memory of the dying domain. If
> this case is eliminated, then my patch should not have deadlock problem. Any
> comments?
>

After a discussion with tinnycloud, his test is working after applying
the previous patch
http://lists.xensource.com/archives/html/xen-devel/2010-12/txteWc7Bs5Yap.txt
(set_shared_p2m_entry is not executed since it is in ASSERT).

And after a few code tracing and testing, my own worry about the
deadlock between p2m_lock and shr_lock actually disappears as the
above discussion. So here I re-attach the patch again which includes
another fix to recover type count when nominate fails on a page (from
our previous dicussions).

See if anything wrong.

Bests,
Jui-Hao

Attachment: mem_sharing_p2mt_race.patch
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.