[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [memory sharing] bug on get_page_and_type
At 15:43 +0000 on 02 Feb (1296661396), MaoXiaoyun wrote: > Hi Tim: > > Thanks for both your advice and quick reply. I will follow. > > So at last we should replace shr_lock with p2m_lock. > But more complicate, it seems both the > *check action* code and *nominate page* code need to be locked > ,right? > If so, quite a lot of *check action* codes need to be locked. Yes, I think you're right about that. Unfortunately there are some very long TOCTOU windows in those kind of p2m lookups, which will get more important as the p2m gets more dynamic. I don't want to have the callers of p2m code touching the p2m lock directly so we may need a new p2m interface to address it. Tim. > Looking forward to your suggestion on the fix, I think Juihao and I > can help to do analyse and test. > > > Date: Tue, 1 Feb 2011 14:28:16 +0000 > > From: Tim.Deegan@xxxxxxxxxx > > To: George.Dunlap@xxxxxxxxxxxxx > > CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx; > > juihaochiang@xxxxxxxxx > > Subject: Re: [Xen-devel] [memory sharing] bug on get_page_and_type > > > > At 12:47 +0000 on 31 Jan (1296478031), George Dunlap wrote: > > > At first glance, it does look like a race, of the type that would > > > normally be done by grabbing the p2m lock. I'd have to take a closer > > > look to be sure, but I won't be able to for at least another day or > > > two. Maybe Tim can comment. > > > > Yes, this is a race; it would be fixed by getting rid of the shr_lock > > and using the p2m lock to cover page-sahring operations, which I think > > is a good idea anyway. > > > > I think I'll have time to look at the locking properly some time this > > month. Of course, I've been wrong before. :) > > < BR>> Tim. > > > > > 2011/1/31 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>: > > > > Hi Georage: > > > > > > > > Thanks for your help, I think I really need to learn hg a bit. > > > > Well, when looking into memory sharing code, a confusion > > > > really brother me a lot. > > > > I think maybe you can enlighten me. > > > > > > > > There has been *check action code* do like below steps > > > > 1) Use gfn_to_mfn to get p2m_type_t p2mt > > > > 2) check the type of p2mt, > > > > 3) do some stuffes base if check on step 2 is failed > > > > > > > > While when *nominate page* to be shared, it does these follow steps > > > > a) check page whether can be shared > > > > b) call page_make_sharable to make sharable > > > > c) if 2) is success, call p2m_change_type to update page type > > > > p2m_ram_shared > > > > > > > > My confusion is if *nominate page* code reach step b, and before it go > > > > to step c > > > > *check action code* reach its step2, it will find page type is > > > > not p2m_ram_shared > > > > and go to step3. And later "nominate page" code go to step (c) > > > > > > > > Take xen/common/memory.c for example, > > > > line 179 is much like step 2, it will goto 196 to clear the > > > > _PGC_allocated > > > > flag > > > > if check failed, so it might indicates, *nominate page* code change the > > > > page > > > > type > > > > into p2m_ram_shared, but the page actually has alreay been freed. > > > > > > > > Am I right? > > > > > > > > > > > > 155 int guest_remove_page(struct domain *d, unsigned long gmfn) > > > > 156 { > > > > 157 struct page_info *page; > > > > 158 #ifdef C ONFIG_X86 > > > > 159 p2m_type_t p2mt; > > > > 160 #endif > > > > 161 unsigned long mfn; > > > > 162 > > > > 163 #ifdef CONFIG_X86 > > > > 164 mfn = mfn_x(gfn_to_mfn(d, gmfn, &p2mt)); > > > > 165 #else > > > > 166 mfn = gmfn_to_mfn(d, gmfn); > > > > 167 #endif > > > > 168 if ( unlikely(!mfn_valid(mfn)) ) > > > > 169 { > > > > 170 gdprintk(XENLOG_INFO, "Domain %u page number %lx invalid\n", > > > > 171 d->domain_id, gmfn); > > > > 172 return 0; > > > > 173 } > > > > 174 > > > > 175  ; page = mfn_to_page(mfn); > > > > 176 #ifdef CONFIG_X86 > > > > 177 /* If gmfn is shared, just drop the guest reference (which may or > > > > may not > > > > 178 * free the page) */ > > > > 179 if(p2m_is_shared(p2mt)) > > > > 180 { > > > > 181 put _page_and_type(page); > > > > 182 guest_physmap_remove_page(d, gmfn, mfn, 0); > > > > 183 return 1; > > > > 184 } > > > > 185 > > > > 186 #endif /* CONFIG_X86 */ > > > > 187 if ( unlikely(!get_page(page, d)) ) > > > > 188 { > > > > 189 gdprintk(XENLOG_INFO, "Bad page free for domain %u\n", > > > > d->domain_id); > > > > 190 return 0; > > > > 189 gdprintk(XENLOG_INFO, "Bad page free for domain %u\n", > > > > d->domain_id); > > > > 190 return 0; > > > > 191 } > > > > 192 > > > > 193 if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) ) > > > > 194 put_page_and_type(page); > > > > 195 > > > > 196 if ( test_and_clear_bit(_PGC_allocated, &page->count_info) ) > > > > 197 put_page(page); > > > > 198 > > > > 199 guest_physmap_remove_page (d, gmfn, mfn, 0); > > > > 200 > > > > 201 put_page(page); > > > > 202 > > > > 203 return 1; > > > > 204 } > > > > > > > >> Date: Mon, 31 Jan 2011 10:49:37 +0000 > > > >> Subject: Re: [Xen-devel] [memory sharing] bug on get_page_and_type > > > >> From: George.Dunlap@xxxxxxxxxxxxx > > > >> To: tinnycloud@xxxxxxxxxxx > > > >> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx; > > > >> juihaochiang@xxxxxxxxx > > > >> > > > >> Xiaoyun, > > > >> > > > >> Thanks for all of your work getting page sharing working. When > > > >> submitting patches, please break them down into individual chunks, > > > >> each of which does one thing. Each patch should also include a > > > >> comment saying what the patch does and why, and a Signed-off-by line > > > ; >> indicating that you certify that the copyright holder (possibly you) > > > >> is placing the code under the GPL. > > > >> > > > >> Using mercurial queues: > > > >> http://mercurial.selenic.com/wiki/MqExtension > > > >> and the mercurial patchbomb extension: > > > >> http://mercurial.selenic.com/wiki/PatchbombExtension > > > >> are particularly handy for this process. > > > > &g t; > > > >> Quick comment: The ept-locking part of the patch is going to be > > > >> NACK-ed, as it will cause circular locking dependencies with the > > > >> hap_lock. c/s 22526:7a5ee380 has an explanation of the circular > > > >> dependency, and a fix which doesn't introduce a circular dependency. > > > >> (It may need to be adapted a bit to apply to 4.0-testing) > > > >> > > > >> -George > > > >> > > > >> On Mon, Jan 31, 2011 at 10:13 AM, tinnycloud <tinnycloud@xxxxxxxxxxx> > > > >> wrote: > > > >> > Hi: > > > >> > > > > >> > > > > >> > Attached is the whole patch suit for latest xen-4.0-testing,changeset > > > >> > 21443, > > > >> > > > > >> > It comes from George, Tim and JuiHao, also, has some extra debug > > > >> > info. > > > >> > > > > >> > > > > >> > > > > >> > On most occasion, memory sharing works fine, but still exists a bug. > > > >> > > > > >> > I?ve been tracing this problem for a while. > > > >> > > > > >> > &nbs p; There is a bug on get_page_and_type() in > > > >> > mem_sharing_share_pages() > > > >> > > > > >> > > > > >&g t; > > > > >> > > > > >> > --------------------------------------------mem_sharing_share_pages()----------------------- > > > >> > > > > >> > 789 if(!get_page_and_type(spage, dom_cow, PGT_shared_page)){ > > > >> > > > > >> > 790 mem_sharing_debug_gfn(cd, gfn->gfn); > > > >> > > > > >> > 791 mem_sharing_debug_gfn(sd, sgfn->gfn); > > > >> > > > > >> > 792 printk("c->dying %d s->dying %d spage %p se->mfn %lx\n", > > > >> > cd->is_dying, sd->is_dying, spage, se->mfn); > > > >> > > > > >> > 793 printk("Debug page: MFN=%lx is ci=%lx, ti=%lx, > > > >> > owner_id=%d\n", > > > >> > > > > >> > 794 mfn_x(page_to_mfn(spage)), > > > >> > > > > >> > 795 spage->count_info, > &g t; > >> > > > > >> > 796 spage->u.inuse.type_info, > > > >> > > > > >> > 797 page_get_owner(spage)->domain_id); > > > >> > > > > >> > 798 BUG(); > > > >> > > > > >> > 799 } > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > Below painc log contains the debug info from line 790-798. > > > >> > > > > >> > We saw that 180000000000000 which is PGC_state_free, > > > >> > > > > >> > So it looks like a shared page has been freed unexpectly. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > (XEN) teardown 64 > > > >> > > > > >> > (XEN) teardow n 66 > > > >> > > > > >> > blktap_sysfs_destroy > > > >> > > > > >> > blktap_sysfs_create: adding attributes for dev ffff880109d0c200 > > > >> > > > > >> > blktap_sysfs_destroy > > > >> > > > > >> > __ratelimit: 1 callbacks suppressed > > > >> > > > > >> > blktap_sysfs_destroy > > > >> > > > > >> > (XEN) Debug for domain=83, gfn=1e8dc, Debug page: MFN=1306dc is > > > >> > ci=8000000000000005, ti=8400000000000001, owner_id=32755 > > > >> > > > > >> > (XEN) Debug for domain=79, gfn=1dc95, Invalid MFN=ffffffffffffffff > > > >> > > > > >> > (XEN) c->dying 0 s->dying 0 spage ffff82f60a0f12a0 se->mfn 507895 > > > >> > > > > >> > (XEN) Debug page: MFN=507895 is ci= 1800000000 00000, > > > >> > ti=8400000000000001, > > > >> > owner_id=32755 > > > >> > > > > >> > (XEN) Xen BUG at mem_sharing.c:798 > > > >> > > > > >> > (XEN) ----[ Xen-4.0.2-rc2-pre x86_64 debug=n Not tainted ]---- > > > >> > > > > >> > (XEN) CPU: 0 > > > >> > > > > >> > (XEN) RIP: e008:[<ffff82c4801c3760>] > > > >> > mem_sharing_share_pages+0x5b0/0x5d0 > > > >> > > > > >> > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > > > >> > > > > >> > (XEN) rax: 0000000000000000 rbx: ffff83044ef76000 rcx: > > > >> > 0000000000000092 > > > >> > > > > >> > (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: > > > >> > ffff82c4802237c4 > > > >> > > > > >> > (XEN) rbp: ffff83030d99e310 rsp: ffff82c48035fc48 r8: > > > >> > 0000000000000001 > > > >> > > > > >> > (XEN) r9: 0000000000000001 r10: 00000000fffffff8 r11: > > > >> > 0000000000000005 > > > >> > > > > >> > (XEN) r12: ffff83050b558000 r13: ffff830626ec5740 r14: > > > >> > 0000000000347967 > > > >> > > > > >> > (XEN) r15: ffff83030d99e068 cr0: 0000000080050033 cr4: > > > >> > 00000000000026f0 > > > >> > > > > >> > (XEN) cr3: 000000010e642000 cr2: 00002abdc4809000 > > > >> > > > > >> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > > >> > > > > >> > (XEN) Xen stack trace from rsp=ffff82c48035fc48: > > > >> > > > > >> > (XEN) 000000000001e8dc ffff83030d99e068 00000000005078 95 > > > >> > ffff83030d99e050 > > > >> > > > > >> > (XEN) ffff82f60a0f12a0 ffff82f60260db80 ffff83030d99e300 > > > >> > ffff830626afbbd0 > > > >> > > > > >> > (XEN)&nb sp; 0000000080372980 ffff82c48035fe38 ffff83023febe000 > > > >> > 00000000008f7000 > > > >> > > > > >> > (XEN) 0000000000305000 0000000000000006 0000000000000006 > > > >> > ffff82c4801c44a4 > > > >> > > > > >> > (XEN) ffff82c480258188 ffff82c48011cb89 000000000001e8dc > > > >> > fffffffffffffff3 > > > >> > > > > >> > (XEN) ffff82c48035fe28 ffff82c480148503 000003d932971281 > > > >> > 0000000000000080 > > > >> > > > > >> > (XEN) 000003d9329611d6 ffff82c48018aedf ffff82c4803903a8 > > > >> > ffff82c48035fd38 > > > >> > > > > >> > (XEN) ffff82c480390380 ffff82c48035fe28 0000000000000002 > > > >> > 0000000000000000 > > > >> > > > > >> > (XEN) 0000000000000002 0000000000000000 ffff83023febe000 > > > >> > ffff83023ff80080 > > > >> > > > > >> > (XEN) ffff82c48035fe58 0000000000000000 ffff82c48022ca80 > > > >> > 0000000000000000 > > > >> > > > > >> > (XEN) 0000000000000002 ffff82c48018a 452 0000000000000000 > > > >> > ffff82c48016f6bb > > > >> > > > > >> > (XEN) ffff82c48035fe58 ffff82c4801538ea 000000023ab79067 > > > >> > fffffffffffffff3 > > > >> > > > > >> > (XEN) ffff82c48035fe28 00000000008f7000 0000000000305000 > > > >> > 0000000000000006 > > > >> > > > > >> > (XEN) 0 000000000000006 ffff82c4801042b3 0000000000000000 > > > >> > ffff82c48010d2a5 > > > >> > > > > >> > (XEN) ffff830477fb51e8 0000000000000000 00007ff000000200 > > > >> > ffff8300bf76e000 > > > >> > > > > >> > (XEN) 0000000600000039 0000000000000000 00007fc09d744003 > > > >> > 0000000000325258 > > > >> > > > > >> > (XEN) 0000000000347967 ffffffffff600429 000000004d463b17 > > > >> > 0000000000062fe2 > > > >> > > > > >> > (XEN) 0000000000000000 00007fc09d744070 00007fc09d744000 > > > >> > 00007fffcfec0d20 > > > >> > > > > >> > (XEN) 00007fc09d744078 0000000000430fd8 00007fffcfec0d88 > > > >> > 00000000019f1ca8 > > > >> > > > > >> > (XEN) 0000f05c00000000 00007fc09f4c2718 0000000000000000 > > &g t; >> > 0000000000000246 > > > >> > > > > >> > (XEN) Xen call trace: > > > >> > > > > >> > (XEN) [<ffff82c4801c3760>] mem_sharing_share_pages+0x5b0/0x5d0 > > > >> > > > > >> > (XEN) [<ffff82c4801c44a4>] mem_sharing_domctl+0xe4/0x130 > > > >> > > > > >> > (XEN) [<ffff82c48011cb89>] cpumask_raise_softirq+0x89/0xa0 > > > >> > > > > >> > (XEN) [<ffff82c480148503>] arch_do_domctl+0x14f3/0x22d0 > > > >> > > > > >> > (XEN) [<ffff82c48018aedf>] handle_hpet_broadcast+0x16f/0x1d0 > > > >> > > > > >> > (XEN) [<ffff82c48018a452>] hpet_legacy_irq_tick+0x42/0x50 > > > >> > > > > >> > (XEN) [<ffff82c48016f6bb>] timer_interrupt+0xb/0x130 > > > >> > > > > >> > (XEN) [<ffff82c4801538ea> ] ack_edge_ioapic_irq+0x2a/0x70 > > > >> > > > > >> > (XEN) [<ffff82c4801042b3>] do_domctl+0x163/0xfe0 > > > >> > > > > >> > (XEN) [<ffff82c48010d2a5>] do_grant_table_op+0x75/0x1ad0 > > > >> > > > > >> > (XEN) [<ffff82c4801e7169>] syscall_enter+0xa9/0xae > > > >> > > > > >> > (XEN) > > > >> > > > > >> > (XEN) > > > >> > > > > >> > (XEN) **************************************** > > > >> > > > > >> > (XEN) Panic on CPU 0: > > > >> > > > > >> > (XEN) Xen BUG at mem_sharing.c:798 > > > >> > > > > >> > (XEN) **************************************** > > > >> > > > > >> > (XEN) > > > >> > > > > & gt;> > (XEN) Manual reset required ('noreboot' specified) > > > >> > > > > >> > > > > >> > > > > >> > _______________________________________________ > > > >> > Xen-devel mailing list > > > >> > Xen-devel@xxxxxxxxxxxxxxxxxxx > > > >> > http://lists.xensource.com/xen- devel > > > >> > > > > >> > > > > > > > > > _______________________________________________ > > > > Xen-devel mailing list > > > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > > > http://lists.xensource.com/xen-devel > > > > > > > > > > > > -- > > Tim Deegan <Tim.Deegan@xxxxxxxxxx> > > Principal Software Engineer, Xen Platform Team > > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel -- Tim Deegan <Tim.Deegan@xxxxxxxxxx> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |