[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [memory sharing] bug on get_page_and_type


  • To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Mon, 31 Jan 2011 12:47:11 +0000
  • Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, tim.deegan@xxxxxxxxxx, juihaochiang@xxxxxxxxx
  • Delivery-date: Mon, 31 Jan 2011 04:48:01 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=r1FPLXJiG/xT7ijiHtneIV1dgYVJg3NSjRdRoEdF5g5IfentT2fQY/WE2mV7aSODYN UTqQlkhPuth6ky2KTm6xqizSaeV0EJgcnlc7TrnvTK0bcD8k9hF04X3li9ltVC3JKabB jTT3A1LsCCn89rYAtdLCgqRu43DKkeLVtrYKI=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

At first glance, it does look like a race, of the type that would
normally be done by grabbing the p2m lock.  I'd have to take a closer
look to be sure, but I won't be able to for at least another day or
two.  Maybe Tim can comment.

 -George

2011/1/31 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> Hi Georage:
>
>       Thanks for your help, I think I really need to learn hg a bit.
>       Well, when looking into memory sharing code, a confusion
> really brother me a lot.
>       I think maybe you can  enlighten me.
>
>       There has been *check action code* do like below steps
>       1) Use gfn_to_mfn to get p2m_type_t p2mt
>       2) check the type of p2mt,
>       3) do some stuffes base if check on step 2 is failed
>
>       While when *nominate  page*  to be shared, it does these follow steps
>       a) check page whether can be shared
>       b) call page_make_sharable to make sharable
>       c) if 2) is success, call p2m_change_type to update page type
> p2m_ram_shared
>
>       My confusion is if *nominate page* code reach step b, and before it go
> to step c
>        *check action code*  reach its step2, it will find page type is
> not  p2m_ram_shared
>        and go to step3. And later "nominate page"  code go to step (c)
>
> Take xen/common/memory.c  for example,
> line 179 is much like step 2, it will goto 196 to clear the _PGC_allocated
> flag
> if check failed, so it might indicates, *nominate page* code change the page
> type
> into p2m_ram_shared, but the page actually has alreay been freed.
>
> Am I right?
>
>
> 155 int guest_remove_page(struct domain *d, unsigned long gmfn)
> 156 {
> 157     struct page_info *page;
> 158 #ifdef CONFIG_X86
> 159     p2m_type_t p2mt;
> 160 #endif
> 161     unsigned long mfn;
> 162
> 163 #ifdef CONFIG_X86
> 164     mfn = mfn_x(gfn_to_mfn(d, gmfn, &p2mt));
> 165 #else
> 166     mfn = gmfn_to_mfn(d, gmfn);
> 167 #endif
> 168     if ( unlikely(!mfn_valid(mfn)) )
> 169     {
> 170         gdprintk(XENLOG_INFO, "Domain %u page number %lx invalid\n",
> 171                 d->domain_id, gmfn);
> 172         return 0;
> 173     }
> 174
> 175&nbsp ;    page = mfn_to_page(mfn);
> 176 #ifdef CONFIG_X86
> 177     /* If gmfn is shared, just drop the guest reference (which may or
> may not
> 178      * free the page) */
> 179     if(p2m_is_shared(p2mt))
> 180     {
> 181         put_page_and_type(page);
> 182         guest_physmap_remove_page(d, gmfn, mfn, 0);
> 183         return 1;
> 184     }
> 185
> 186 #endif /* CONFIG_X86 */
> 187     if ( unlikely(!get_page(page, d)) )
> 188     {
> 189         gdprintk(XENLOG_INFO, "Bad page free for domain %u\n",
> d->domain_id);
> 190         return 0;
> 189         gdprintk(XENLOG_INFO, "Bad page free for domain %u\n",
> d->domain_id);
> 190         return 0;
> 191     }
> 192
> 193     if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
> 194         put_page_and_type(page);
> 195
> 196     if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
> 197          put_page(page);
> 198
> 199     guest_physmap_remove_page(d, gmfn, mfn, 0);
> 200
> 201     put_page(page);
> 202
> 203     return 1;
> 204 }
>
>> Date: Mon, 31 Jan 2011 10:49:37 +0000
>> Subject: Re: [Xen-devel] [memory sharing] bug on get_page_and_type
>> From: George.Dunlap@xxxxxxxxxxxxx
>> To: tinnycloud@xxxxxxxxxxx
>> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; tim.deegan@xxxxxxxxxx;
>> juihaochiang@xxxxxxxxx
>>
>> Xiaoyun,
>>
>> Thanks for all of your work getting page sharing working. When
>> submitting patches, please break them down into individual chunks,
>> each of which does one thing. Each patch should also include a
>> comment saying what the patch does and why, and a Signed-off-by line
>> indicating that you certify that the copyright holder (possibly you)
>> is placing the code under the GPL.
>>
>> Using mercurial queues:
>> http://mercurial.selenic.com/wiki/MqExtension
>> and the mercurial patchbomb extension:
>> http://mercurial.selenic.com/wiki/PatchbombExtension
>> are particularly handy for this process.
> &g t;
>> Quick comment: The ept-locking part of the patch is going to be
>> NACK-ed, as it will cause circular locking dependencies with the
>> hap_lock. c/s 22526:7a5ee380 has an explanation of the circular
>> dependency, and a fix which doesn't introduce a circular dependency.
>> (It may need to be adapted a bit to apply to 4.0-testing)
>>
>> -George
>>
>> On Mon, Jan 31, 2011 at 10:13 AM, tinnycloud <tinnycloud@xxxxxxxxxxx>
>> wrote:
>> > Hi:
>> >
>> >
>> > Attached is the whole patch suit for latest xen-4.0-testing,changeset
>> > 21443,
>> >
>> > It comes from George, Tim and JuiHao, also, has some extra debug info.
>> >
>> >
>> >
>> > On most occasion, memory sharing works fine, but still exists a bug.
>> >
>> >        I’ve been tracing this problem for a while.
>> >
>> >   &nbs p;    There is a bug on get_page_and_type() in
>> > mem_sharing_share_pages()
>> >
>> >
>> >
>> >
>> > --------------------------------------------mem_sharing_share_pages()-----------------------
>> >
>> > 789         if(!get_page_and_type(spage, dom_cow, PGT_shared_page)){
>> >
>> > 790             mem_sharing_debug_gfn(cd, gfn->gfn);
>> >
>> > 791             mem_sharing_debug_gfn(sd, sgfn->gfn);
>> >
>> > 792             printk("c->dying %d s->dying %d spage %p se->mfn %lx\n",
>> > cd->is_dying, sd->is_dying, spage, se->mfn);
>> >
>> > 793             printk("Debug page: MFN=%lx is ci=%lx, ti=%lx,
>> > owner_id=%d\n",
>> >
>> > 794                     mfn_x(page_to_mfn(spage)),
>> >
>> > 795                     spage->count_info,
>> >
>> >  796                     spage->u.inuse.type_info,
>> >
>> > 797                     page_get_owner(spage)->domain_id);
>> >
>> > 798             BUG();
>> >
>> > 799         }
>> >
>> >
>> >
>> >
>> >
>> > Below painc log contains the debug info from line 790-798.
>> >
>> > We saw that 180000000000000 which is PGC_state_free,
>> >
>> > So it looks like a shared page has been freed unexpectly.
>> >
>> >
>> >
>> >
>> >
>> > (XEN) teardown 64
>> >
>> > (XEN) teardown 66
>> >
>> > blktap_sysfs_destroy
>> >
>> > blktap_sysfs_create: adding attributes for dev ffff880109d0c200
>> >
>> > blktap_sysfs_destroy
>> >
>> > __ratelimit: 1 callbacks suppressed
>> >
>> > blktap_sysfs_destroy
>> >
>> > (XEN) Debug for domain=83, gfn=1e8dc, Debug page: MFN=1306dc is
>> > ci=8000000000000005, ti=8400000000000001, owner_id=32755
>> >
>> > (XEN) Debug for domain=79, gfn=1dc95, Invalid MFN=ffffffffffffffff
>> >
>> > (XEN) c->dying 0 s->dying 0 spage ffff82f60a0f12a0 se->mfn 507895
>> >
>> > (XEN) Debug page: MFN=507895 is ci= 180000000000000,
>> > ti=8400000000000001,
>> > owner_id=32755
>> >
>> > (XEN) Xen BUG at mem_sharing.c:798
>> >
>> > (XEN) ----[ Xen-4.0.2-rc2-pre  x86_64  debug=n  Not tainted ]----
>> >
>> > (XEN) CPU:    0
>> >
>> > (XEN) RIP:    e008:[<ffff82c4801c3760>]
>> > mem_sharing_share_pages+0x5b0/0x5d0
>> >
>> > (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
>> >
>> > (XEN) rax: 0000000000000000   rbx: ffff83044ef76000   rcx:
>> > 0000000000000092
>> >
>> > (XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi:
>> > ffff82c4802237c4
>> >
>> > (XEN) rbp: ffff83030d99e310   rsp: ffff82c48035fc48    r8:
>> > 0000000000000001
>> >
>> > (XEN) r9:  0000000000000001   r10: 00000000fffffff8   r11:
>> > 0000000000000005
>> >
>> > (XEN) r12: ffff83050b558000   r13: ffff830626ec5740   r14:
>> > 0000000000347967
>> >
>> > (XEN) r15: ffff83030d99e068   cr0: 0000000080050033   cr4:
>> > 00000000000026f0
>> >
>> > (XEN) cr3: 000000010e642000   cr2: 00002abdc4809000
>> >
>> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> >
>> > (XEN) Xen stack trace from rsp=ffff82c48035fc48:
>> >
>> > (XEN)    000000000001e8dc ffff83030d99e068 0000000000507895
>> > ffff83030d99e050
>> >
>> > (XEN)    ffff82f60a0f12a0 ffff82f60260db80 ffff83030d99e300
>> > ffff830626afbbd0
>> >
>> > (XEN)&nb sp;   0000000080372980 ffff82c48035fe38 ffff83023febe000
>> > 00000000008f7000
>> >
>> > (XEN)    0000000000305000 0000000000000006 0000000000000006
>> > ffff82c4801c44a4
>> >
>> > (XEN)    ffff82c480258188 ffff82c48011cb89 000000000001e8dc
>> > fffffffffffffff3
>> >
>> > (XEN)    ffff82c48035fe28 ffff82c480148503 000003d932971281
>> > 0000000000000080
>> >
>> > (XEN)    000003d9329611d6 ffff82c48018aedf ffff82c4803903a8
>> > ffff82c48035fd38
>> >
>> > (XEN)    ffff82c480390380 ffff82c48035fe28 0000000000000002
>> > 0000000000000000
>> >
>> > (XEN)    0000000000000002 0000000000000000 ffff83023febe000
>> > ffff83023ff80080
>> >
>> > (XEN)    ffff82c48035fe58 0000000000000000 ffff82c48022ca80
>> > 0000000000000000
>> >
>> > (XEN)    0000000000000002 ffff82c48018a 452 0000000000000000
>> > ffff82c48016f6bb
>> >
>> > (XEN)    ffff82c48035fe58 ffff82c4801538ea 000000023ab79067
>> > fffffffffffffff3
>> >
>> > (XEN)    ffff82c48035fe28 00000000008f7000 0000000000305000
>> > 0000000000000006
>> >
>> > (XEN)    0000000000000006 ffff82c4801042b3 0000000000000000
>> > ffff82c48010d2a5
>> >
>> > (XEN)    ffff830477fb51e8 0000000000000000 00007ff000000200
>> > ffff8300bf76e000
>> >
>> > (XEN)    0000000600000039 0000000000000000 00007fc09d744003
>> > 0000000000325258
>> >
>> > (XEN)    0000000000347967 ffffffffff600429 000000004d463b17
>> > 0000000000062fe2
>> >
>> > (XEN)    0000000000000000 00007fc09d744070 00007fc09d744000
>> > 00007fffcfec0d20
>> >
>> > (XEN)    00007fc09d744078 0000000000430fd8 00007fffcfec0d88
>> > 00000000019f1ca8
>> >
>> > (XEN)    0000f05c00000000 00007fc09f4c2718 0000000000000000
>> > 0000000000000246
>> >
>> > (XEN) Xen call trace:
>> >
>> > (XEN)    [<ffff82c4801c3760>] mem_sharing_share_pages+0x5b0/0x5d0
>> >
>> > (XEN)    [<ffff82c4801c44a4>] mem_sharing_domctl+0xe4/0x130
>> >
>> > (XEN)    [<ffff82c48011cb89>] cpumask_raise_softirq+0x89/0xa0
>> >
>> > (XEN)    [<ffff82c480148503>] arch_do_domctl+0x14f3/0x22d0
>> >
>> > (XEN)    [<ffff82c48018aedf>] handle_hpet_broadcast+0x16f/0x1d0
>> >
>> > (XEN)    [<ffff82c48018a452>] hpet_legacy_irq_tick+0x42/0x50
>> >
>> > (XEN)    [<ffff82c48016f6bb>] timer_interrupt+0xb/0x130
>> >
>> > (XEN)    [<ffff82c4801538ea> ] ack_edge_ioapic_irq+0x2a/0x70
>> >
>> > (XEN)    [<ffff82c4801042b3>] do_domctl+0x163/0xfe0
>> >
>> > (XEN)    [<ffff82c48010d2a5>] do_grant_table_op+0x75/0x1ad0
>> >
>> > (XEN)    [<ffff82c4801e7169>] syscall_enter+0xa9/0xae
>> >
>> > (XEN)
>> >
>> > (XEN)
>> >
>> > (XEN) ****************************************
>> >
>> > (XEN) Panic on CPU 0:
>> >
>> > (XEN) Xen BUG at mem_sharing.c:798
>> >
>> > (XEN) ****************************************
>> >
>> > (XEN)
>> >
>> > (XEN) Manual reset required ('noreboot' specified)
>> >
>> >
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen- devel
>> >
>> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.