[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH] mem_sharing: fix race condition of nominate and unshare


  • To: tinnycloud <tinnycloud@xxxxxxxxxxx>
  • From: Jui-Hao Chiang <juihaochiang@xxxxxxxxx>
  • Date: Mon, 10 Jan 2011 16:10:03 +0800
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Tim Deegan <Tim.Deegan@xxxxxxxxxx>
  • Delivery-date: Mon, 10 Jan 2011 00:11:08 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=sz2e/SapjNLlVORU+tyi11VyIXpdZS6E/cSaWDWIEydHv9FLfiItQV+Smo6VKzr4j2 YOehtQKkAdCcEIrCgn2StWoqVeywAcZ0GcMxdBUCKRfIupglWIryhR1xnrcSw2QXWfyP nGJ+XjCCne2PNQohOLni5qMbPOGcNNNzjf5V8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, tinnycloud:

Thanks for your testing info.
I assume you have mem_sharing_unshare_page called with successful return value, otherwise the mod_l1_entry won't be called.
Could you call mem_sharing_debug_gfn() before unshare() return success?
In addition, are there multiple CPUs touching the same page? e.g. you can print out the cpu id inside unshare() and the mm.c:859.


After this change, unshare() has a potential problem of deadlock for shr_lock and p2m_lock with different locking order.
Assume two CPUs do the following
CPU1: hvm_hap_nested_page_fault() => unshare() => p2m_change_type() (locking order: shr_lock, p2m_lock)
CPU2: p2m_teardown() => unshare() (locking order: p2m_lock, shr_lock)
When CPU1 grabs shr_lock and CPU2 grabs p2m_lock, they deadlock later.

 1.       mem_sharing_unshare_page() has the routine  called from gfn_to_mfn_unshare, which is called by gnttab_transfer  

Since no bug report on grant_table right now, so I think this is safe for now

Also  p2m_tear_down è mem_sharing_unshare_page() , its flag is MEM_SHARING_DESTROY_GFN, and won’t has the chance to

call set_shared_p2m_entry()

 

Of course, the p2m_teardown won't call set_shared_p2m_entry. But this does not change my argument that p2m_teardown() hold p2m_lock to wait on shr_lock. Actaully, after looking for a while, I rebut myself that the scenario of deadlock won't exist.
When p2m_teardown is called, the domain is dying in its last few steps (device, irq are released), and there is no way for hvm_hap_nested_page_fault() to happen on the memory of the dying domain. If this case is eliminated, then my patch should not have deadlock problem. Any comments?

3.       set_shared_p2m_entry() which call set_p2m_entry() is not in p2m_lock, and I found in other code set_p2m_entry is called in p2m_lock,

so here I think it is a problem

 

So I think at least set_p2m_entry should be put into p2m_lock.

I’ll do more investigation base on this.

 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.