[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and unshare



it is later found that domain is dying, so when dying alloc page is prohibitted
 
 (XEN) ---domain is 1, max_pages 132096, total_pages 29736
 
output log from line 914
 
 909     old_page = page;
 910     page = mem_sharing_alloc_page(d, gfn, flags & MEM_SHARING_MUST_SUCCEED);
 911     if(!page)
 912     {
 913         mem_sharing_debug_gfn(d, gfn);
 914         printk("---domain is %d, max_pages %u, total_pages %u \n", d->is_dying, d->max_pages, d->tot_pages);
 915         BUG_ON(!d);

 
--------------
Well the logic is a bit of complicate, my fix is to set gfn's mfn to  INVALID_MFN
 
 
 876     ret = page_make_private(d, page);                                                                                                                  
 877     /*last_gfn shoule able to be make_private*/
 878     BUG_ON(last_gfn & ret);
 879   &n bsp; if(ret == 0) goto private_page_found;
 880
 881     ld = rcu_lock_domain_by_id(d->domain_id);                                                                                                        
 882     BUG_ON(!ld);
 883     if(ld->is_dying )
 884     {
 885         if(!ld)
 8 86             printk("d is NULL %d\n", d->domain_id);
 887         else
 888             printk("d is dying %d %d\n", d->is_dying, d->domain_id);
 889
 890         /*decrease page type count and destory gfn*/
 891         put_page_and_type(page);
 892         mem_sharing_gfn_destroy(gfn_info, !last_gfn);
 893
 894         if(last_gfn)
 895             mem_sharing_hash_delete(handle);
 896         else
 897           ;   /* Even though we don't allocate a private page, we have to account
 898              * for the MFN that originally backed this PFN. */
 899             atomic_dec(&nr_saved_mfns);
 900
 901         /*set mfn invalid*/
 902         BUG_ON(set_shared_p2m_entry_invalid(d, gfn)==0);
 903         if(ld)
 904           rcu_unlock_domain(ld);
 905         shr_unlock();
 906         return 0;
 907     }

 
Any other suggestions?
 
 


> Date: Thu, 20 Jan 2011 09:19:34 +0000
> From: Tim.Deegan@xxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; juihaochiang@xxxxxxxxx
> Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
>
> At 07:19 +0000 on 20 Jan (1295507976), MaoXiaoyun wrote:
> > Hi:
> >
> > The latest BUG in mem_sharing_alloc_page from mem_sharing_unshare_page.
> > I printed heap info, which shows plenty memory left.
> > Could domain be NULL during in unshare, or should it be locked by rcu_lock_domain_by_id ?
> >
>
> 'd' probably isn't NULL; more likely is that the domain is not allowed
> to have any more memory. You should look at the values of d->max_pages
> and d->tot_pages when the failure happens.
>
> Cheers.
>
> Tim.
>
> > -----------code------------
> > 422 extern void pa gealloc_info(unsigned char key);
> > 423 static struct page_info* mem_sharing_alloc_page(struct domain *d,
> > 424 unsigned long gfn,
> > 425 int must_succeed)
> > 426 {
> > 427 struct page_info* page;
> > 428 struct vcpu *v = current;
> > 429 mem_event_request_t req;
> > 430
> > 431 page = alloc_domheap_page(d, 0);
> > 432 if(page != NULL) return page;
> > 433
> > 434 memset(&req, 0, sizeof(req));
> > 435 if(must_succeed)
> > 436 {
> > 437 /* We do not support 'must_succeed' any more. External operations such
> > 438 * as grant table mappings may fail with OOM condition!
> > 439 */
> > 440 pagealloc_info('m');
> > 441 BUG();
> > 442 }
> >
> > -------------serial output-------
> > (XEN) Physical memory information:
> > (XEN) Xen heap: 0kB free
> > (X EN) heap[14]: 64480kB free
> > (XEN) heap[15]: 131072kB free
> > (XEN) heap[16]: 262144kB free
> > (XEN) heap[17]: 524288kB free
> > (XEN) heap[18]: 1048576kB free
> > (XEN) heap[19]: 1037128kB free
> > (XEN) heap[20]: 3035744kB free
> > (XEN) heap[21]: 2610292kB free
> > (XEN) heap[22]: 2866212kB free
> > (XEN) Dom heap: 11579936kB free
> > (XEN) Xen BUG at mem_sharing.c:441
> > (XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790
> > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff83040092d808 rcx: 0000000000000096
> > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021eac4
> > (XEN) rbp: 0000000000000000 rsp: ffff82c48035f5e8 r8: 0000000000000001
> > (XE N) r9: 0000000000000001 r10: 00000000fffffff5 r11: 0000000000000008
> > (XEN) r12: ffff8305c61f3980 r13: ffff83040eff0000 r14: 000000000001610f
> > (XEN) r15: ffff82c48035f628 cr0: 000000008005003b cr4: 00000000000026f0
> > (XEN) cr3: 000000052bc4f000 cr2: ffff880120126e88
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c48035f5e8:
> > (XEN) ffff8305c61f3990 00018300bf2f0000 ffff82f604e6a4a0 000000002ab84078
> > (XEN) ffff83040092d7f0 00000000001b9c9c ffff8300bf2f0000 000000010eff0000
> > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN) 0000000000000000 0000000d0000010f ffff8305447ec000 000000000001610f
> > (XEN) 0000000000273525 ffff82c48035f724 ffff830502c705a0 ffff82f602c89a00
> > (XEN) ffff83040eff0000 ffff82c48010bfa9 ffff830572c5dbf0 000000000029e07f
> > (XEN) 0000000000000 000 ffff830572c5dbf0 000000008035fbe8 ffff82c48035f6f8
> > (XEN) 0000000100000002 ffff830572c5dbf0 ffff83063fc30000 ffff830572c5dbf0
> > (XEN) 0000035900000000 ffff88010d14bbe0 ffff880159e09000 00003f7e00000002
> > (XEN) ffffffffffff0032 ffff88010d14bbb0 ffff830438dfa920 0000000d8010a650
> > (XEN) 0000000000000100 ffff83063fc30000 ffff8305f9203730 ffffffffffffffea
> > (XEN) ffff88010d14bb70 0000000000000000 ffff88010d14bc10 ffff88010d14bbc0
> > (XEN) 0000000000000002 ffff82c48010da9b 0000000000000202 ffff82c48035fec8
> > (XEN) ffff82c48035f7c8 00000000801880af ffff83063fc30010 0000000000000000
> > (XEN) ffff82c400000008 ffff82c48035ff28 0000000000000000 ffff88010d14bbc0
> > (XEN) ffff880159e08000 0000000000000000 0000000000000000 00020000000002d7
> > (XEN) 00000000003f2b38 ffff8305b1f4b6b8 ffff8305b30f0000 ffff880159e09000
> > (XEN) 0000000000000000 0000000000000000 000200000000 008a 00000000003ed1f9
> > (XEN) ffff83063fc26450 ffff8305b30f0000 ffff880159e0a000 0000000000000000
> > (XEN) 0000000000000000 00020000000001fa 000000000029e2ba ffff83063fc26fd0
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c4801c0531>] mem_sharing_unshare_page+0x681/0x790
> > (XEN) [<ffff82c48010bfa9>] gnttab_map_grant_ref+0xbf9/0xe30
> > (XEN) [<ffff82c48010da9b>] do_grant_table_op+0x14b/0x1080
> > (XEN) [<ffff82c48010fb44>] do_xen_version+0xb4/0x480
> > (XEN) [<ffff82c4801b8215>] set_p2m_entry+0x85/0xc0
> > (XEN) [<ffff82c4801bc92e>] set_shared_p2m_entry+0x1be/0x2f0
> > (XEN) [<ffff82c480121c4c>] xmem_pool_free+0x2c/0x310
> > (XEN) [<ffff82c4801bfaf8>] mem_sharing_share_pages+0xd8/0x3d0
> > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c48011c519>] cpumask_raise_softirq+0x89/0 xa0
> > (XEN) [<ffff82c480118351>] csched_vcpu_wake+0x101/0x1b0
> > (XEN) [<ffff82c48014717d>] vcpu_kick+0x1d/0x80
> > (XEN) [<ffff82c4801447da>] __find_next_bit+0x6a/0x70
> > (XEN) [<ffff82c48015a1d8>] get_page+0x28/0xf0
> > (XEN) [<ffff82c48015ed72>] do_update_descriptor+0x1d2/0x210
> > (XEN) [<ffff82c480113d7e>] do_multicall+0x14e/0x340
> > (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Xen BUG at mem_sharing.c:441
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> >
> > > Date: Mon, 17 Jan 2011 17:02:02 +0800
> > > Subject: Re: [PATCH] mem_sharing: fix race condition of nominate and unshare
> > > From: juihaochiang@xxxxxxxxx
> > > To: tinnycloud@xxxxxxxxxxx
> > > CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx
> > >
> > > Hi, tinnycloud:
> > >
> > > Do you have xenpaging tools running properly?
> > > I haven't gone through that one, but it seems you have run out of memory.
> > > When this case happens, mem_sharing will request memory to the
> > > xenpaging daemon, which tends to page out and free some memory.
> > > Otherwise, the allocation would fail.
> > > Is this your scenario?
> > >
> > > Bests,
> > > Jui-Hao
> > >
> > > 2011/1/17 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> > > > Another failure on BUG() in mem_sharing_alloc_page()
> > > >
> > > > memset(&req, 0, sizeof(req));
> > > > if(must_succeed)
> > > > {
> > > > /* We do not support 'must_succeed' any more. External operations
> > > > such
> > > > * as grant table mappings may fail with OOM condition!
> > > > */
> > > > BUG();===================>bug here
> > > > }
> > > > else
> > > > {
> > > > /* All foreign attempts to unshare pages should be handled through
> > > > * 'must_succeed' case. */
> > > > ASSERT(v->domain->domain_id == d->domain_id);
> > > > vcpu_pause_nosync(v);
> > > > req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
> > > > }
> > > >
>
> --
> Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> Principal Software Engineer, Xen Platform Team
> Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.