[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] re: [PATCH] mem_sharing: fix race condition of nominate and unshare



Hi Jui-Hao:

 

         Under what situation hvm_hap_nested_page_fault will be called?

 

         Attach is the latest log with CPU id.

 

         Before each call of unshare() I print out the caller.

        

         From the log u will find the error mfn is MFN=2df895,  in line 488

       Line 37:(XEN) ===>mem_sharing_share_pages mfn 2df895 gfn 520798 p2md d did 2

Is the log in mem_sharing_share_pages, from below line 632

 

618    /* gfn lists always have at least one entry => save to call list_entry */

619     mem_sharing_gfn_account(gfn_get_info(&ce->gfns), 1);

620     mem_sharing_gfn_account(gfn_get_info(&se->gfns), 1);

621     list_for_each_safe(le, te, &ce->gfns)

622     {

623         gfn = list_entry(le, struct gfn_info, list);

624         /* Get the source page and type, this should never fail

625          * because we are under shr lock, and got non-null se */

626         BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page));

627         /* Move the gfn_info from ce list to se list */

628         list_del(&gfn->list);

629         d = get_domain_by_id(gfn->domain);

630         BUG_ON(!d);

631         gfn_to_mfn(d, gfn->gfn, &p2mt);

632         printk("===>mem_sharing_share_pages mfn %lx gfn %lu p2md %x did %d\n", se->mfn, gfn->gfn, p2mt,d->domain_id);

633         BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0);

634         put_domain(d);

635         list_add(&gfn->list, &se->gfns);

636         put_page_and_type(cpage);

637     }  

 

Sent: Jui-Hao Chiang [mailto:juihaochiang@xxxxxxxxx]
Date: 2011
110 16:10
To: tinnycloud
CC: Tim Deegan; xen-devel@xxxxxxxxxxxxxxxxxxx
Sub: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare

 

Hi, tinnycloud:

Thanks for your testing info.
I assume you have mem_sharing_unshare_page called with successful return value, otherwise the mod_l1_entry won't be called.
Could you call mem_sharing_debug_gfn() before unshare() return success?
In addition, are there multiple CPUs touching the same page? e.g. you can print out the cpu id inside unshare() and the
mm.c:859.


After this change, unshare() has a potential problem of deadlock for shr_lock and p2m_lock with different locking order.
Assume two CPUs do the following
CPU1: hvm_hap_nested_page_fault() => unshare() => p2m_change_type() (locking order: shr_lock, p2m_lock)
CPU2: p2m_teardown() => unshare() (locking order: p2m_lock, shr_lock)
When CPU1 grabs shr_lock and CPU2 grabs p2m_lock, they deadlock later.

 1.       mem_sharing_unshare_page() has the routine  called from gfn_to_mfn_unshare, which is called by gnttab_transfer  

Since no bug report on grant_table right now, so I think this is safe for now

Also  p2m_tear_down è mem_sharing_unshare_page() , its flag is MEM_SHARING_DESTROY_GFN, and won’t has the chance to

call set_shared_p2m_entry()

 

Of course, the p2m_teardown won't call set_shared_p2m_entry. But this does not change my argument that p2m_teardown() hold p2m_lock to wait on shr_lock. Actaully, after looking for a while, I rebut myself that the scenario of deadlock won't exist.
When p2m_teardown is called, the domain is dying in its last few steps (device, irq are released), and there is no way for hvm_hap_nested_page_fault() to happen on the memory of the dying domain. If this case is eliminated, then my patch should not have deadlock problem. Any comments?

3.       set_shared_p2m_entry() which call set_p2m_entry() is not in p2m_lock, and I found in other code set_p2m_entry is called in p2m_lock,

so here I think it is a problem

 

So I think at least set_p2m_entry should be put into p2m_lock.

I’ll do more investigation base on this.

 

Attachment: log.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.