[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RE: [PATCH] mem_sharing: fix race condition of nominate and unshare
Very likely. If you look in xen/arch/x86/mm/p2m.c, the two functions which check a page to see if it can be reclaimed are "p2m_pod_zero_check*()". A little ways into each function there's a giant "if()" which has all of the conditions for reclaiming a page, starting with p2m_is_ram(). The easiest way to fix it is to add p2m_is_shared() to that "if" statement. -George 2011/1/19 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>: > Hi George: > > I am working on the xen mem_sharing, I think the bug below is > related to POD. > (Test shows when POD is enable, it is easily hit the bug, when disabled, no > bug occurs). > > As I know when domU starts will POD, it gets memory from POD cache, and in > some > situation, POD cached will scan from Zero pages for reusing(link the page > into POD > cache page list), and from the page_info define, list and handle share same > posistion, > I think when reusing the page, POD doest't check page type, and if it is a > shared page > , it still can be put into POD cache, and thus handle is been overwritten. > > So maybe we need to check the page type before putting into cache, > What's your opinion? > thanks. > >>-------------------------------------------------------------------------------- >>From: tinnycloud@xxxxxxxxxxx >>To: juihaochiang@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >>CC: tim.deegan@xxxxxxxxxx >>Subject: RE: [PATCH] mem_sharing: fix race condition of nominate and >> unshare >>Date: Tue, 18 Jan 2011 20:05:16 +0800 >> >>Hi: >> >> It is later found that caused by below patch code and I am using the >> blktap2. >>The handle retruned from here will later become ch in >> mem_sharing_share_pages, and then >>in mem_sharing_share_pages will have ch = sh, thus caused the problem. >> >>+ /* Return the handle if the page is already shared */ >>+ page = mfn_to_page(mfn); >>+ if (p2m_is_shared(p2mt)) { >>+ *phandle = page->shr_handle; >>+ ret = 0; >>+ goto out; >>+ } >>+ >> >>But. after I removed the code, tests still failed, and this handle's value >> is not make sence. >> >> >>(XEN) ===>total handles 17834 total gfns 255853 >>(XEN) handle 13856642536914634 >>(XEN) Debug for domain=1, gfn=19fed, Debug page: MFN=349c0a is >> ci=8000000000000008, ti=8400000000000007, owner_id=32755 >>(XEN) ----[ Xen-4.0.0 x86_64 debug=n Not tainted ]---- >>(XEN) CPU: 15 >>(XEN) RIP: e008:[<ffff82c4801bff4b>] >> mem_sharing_unshare_page+0x19b/0x720 >>(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor >>(XEN) rax: 0000000000000000 rbx: ffff83063fc67f28  ; rcx: >> 0000000000000092 >>(XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48021e9c4 >>(XEN) rbp: ffff830440000000 rsp: ffff83063fc67c48 r8: 0000000000000001 >>(XEN) r9: 0000000000000000 r10: 00000000fffffff8 r11: 0000000000000005 >>(XEN) r12: 0000000000019fed r13: 0000000000000000 r14: 0000000000000000 >>(XEN) r15: ffff82f606938140 cr0: 000000008005003b cr4: 00000000000026f0 >>(XEN) cr3: 000000055513c000 cr2: 0000000000000018 >>(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 >>(XEN) Xen stack trace from rsp=ffff83063fc67c48: >>(XEN) 02c5f6c8b70fed66 39ef64058b487674 ffff82c4801a6082 >> 0000000000000000 >>(XEN) 00313a8b00313eca 0000000000000001 0000000000000009 ff >> ff830440000000 >>(XEN) ffff83063fc67cb8 ffff82c4801df6f9 0000000000000040 >> ffff83063fc67d04 >>(XEN) 0000000000019fed 0000000d000001ed ffff83055458d000 >> ffff83063fc67f28 >>(XEN) 0000000000019fed 0000000000349c0a 0000000000000030 >> ffff83063fc67f28 >>(XEN) 0000000000000030 ffff82c48019baa6 ffff82c4802519c0 >> 0000000d8016838e >>(XEN) 0000000000000000 00000000000001aa ffff8300bf554000 >> ffff82c4801b3864 >>(XEN) ffff830440000348 ffff8300bf554000 ffff8300bf5557f0 >> ffff8300bf5557e8 >>(XEN) 00000032027b81f2 ffff82c48026f080 ffff82c4801a9337 >> ffff8300bf448000 >>(XEN) ffff8300bf554000 ffff830000000000 0000000019fed000 >> ffff8300bf2f2000 >>(XEN) ffff82c48019985d 0000000000000080 ffff8300bf554000 >> 0000000000019fed >>(XEN) ffff82c4801b08ba 000000000001e000 ffff82c48014931f ff >> ff8305570c6d50 >>(XEN) ffff82c480251080 00000032027b81f2 ffff8305570c6d50 >> ffff83052f3e2200 >>(XEN) 0000000f027b7de0 ffff82c48011e07a 000000000000000f >> ffff82c48026f0a0 >>(XEN) 0000000000000082 0000000000000000 0000000000000000 >> 0000000000009e44 >>(XEN) ffff8300bf554000 ffff8300bf2f2000 ffff82c48011e07a >> 000000000000000f >>(XEN) ffff8300bf555760 0000000000000292 ffff82c48011afca >> 00000032028a8fc0 >>(XEN) 0000000000000292 ffff82c4801a93c3 00000000000000ef >> ffff8300bf554000 >>(XEN) ffff8300bf554000 ffff8300bf5557e8 ffff82c4801a6082 >> ffff8300bf554000 >>(XEN) 0000000000000000 ffff82c4801a0cc8 ffff8300bf554000 >> ffff8300bf554000 >>(XEN) Xen call trace: >>(XEN) [<ffff82c4801bff4b>] mem_sharing_unshare_page+0x19b/0x720 >>(XEN) [<ffff82c4801a6082>] v lapic_has_pending_irq+0x42/0x70 >>(XEN) [<ffff82c4801df6f9>] ept_get_entry+0xa9/0x1c0 >>(XEN) [<ffff82c48019baa6>] hvm_hap_nested_page_fault+0xd6/0x190 >>(XEN) [<ffff82c4801b3864>] vmx_vmexit_handler+0x304/0x1a90 >>(XEN) [<ffff82c4801a9337>] pt_restore_timer+0x57/0xb0 >>(XEN) [<ffff82c48019985d>] hvm_do_resume+0x1d/0x130 >>(XEN) [<ffff82c4801b08ba>] vmx_do_resume+0x11a/0x1c0 >>(XEN) [<ffff82c48014931f>] context_switch+0x76f/0xf00 >>(XEN) [<ffff82c48011e07a>] add_entry+0x3a/0xb0 >>(XEN) [<ffff82c48011e07a>] add_entry+0x3a/0xb0 >>(XEN) [<ffff82c48011afca>] schedule+0x1ea/0x500 >>(XEN) [<ffff82c4801a93c3>] pt_update_irq+0x33/0x1e0 >>(XEN) [< ;ffff82c4801a6082>] vlapic_has_pending_irq+0x42/0x70 >>(XEN) [<ffff82c4801a0cc8>] hvm_vcpu_has_pending_irq+0x88/0xa0 >>(XEN) [<ffff82c4801b267b>] vmx_vmenter_helper+0x5b/0x150 >>(XEN) [<ffff82c4801adaa3>] vmx_asm_do_vmentry+0x0/0xdd >>(XEN) >>(XEN) Pagetable walk from 0000000000000018: >>(XEN) L4[0x000] = 0000000000000000 ffffffffffffffff >>(XEN) >>(XEN) **************************************** >>(XEN) Panic on CPU 15: >>(XEN) FATAL PAGE FAULT >>(XEN) [error_code=0000] >>(XEN) Faulting linear address: 0000000000000018 >>(XEN) **************************************** >>(XEN) >>(XEN) Manual reset required ('noreboot' specified) >> >> >> >> >> >> --------------------------------------------------------------------------------------------------- >>>From: tinnycloud@xxxxxxxxxxx >>>To: juihaochiang@xxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx >>>CC: tim.deegan@xxxxxxxxxx >>>Subject: RE: [PATCH] mem_sharing: fix race condition of nominate and >>> unshare >>>Date: Tue, 18 Jan 2011 17:42:32 +0800 >> >>>Hi Tim & Jui-Hao: >> >> > When I use Linux HVM instead of Windows HVM, more bug shows up. >> >>> I only start on VM, and when I destroy it , xen crashed on >>> mem_sharing_unshare_page() >>>which in line709, hash_entry is NULL. Later I found the handle has been >>> removed in >>>mem_sharing_share_pages(), please refer logs below. >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |