[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RE: mem_sharing: summarized problems when domain is dying


  • To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • Date: Mon, 24 Jan 2011 14:08:01 +0000
  • Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, tim.deegan@xxxxxxxxxx, juihaochiang@xxxxxxxxx
  • Delivery-date: Mon, 24 Jan 2011 06:08:41 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=dKh4F8DN1/AQTfqJfSkJmjoRhcjmEQJGo9GLLKBX9Th72ugJmQe10L/F6/FZzzLRiY 8tZJL9+D0JRiKKMcOrhcIqK76VCEQtcnQkzXFGck4u4lzUL/NHeOc9b2BnEoPG5QrJLi 6E4O3OoV4cl1BRAKsJzg6TMQdLkxQ/u+ocRf0=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I think it would be best if every separate issue you're facing is a
separate thread.  This looks like a Linux crash -- please include the
kernel version you're using, and whatever other information might be
appropriate.

 -George

2011/1/24 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> Hi:
>
>        Another BUG found when testing memory sharing.
>        In this test, I start 24 linux HVMS, each of them reboot through "xm
> reboot" every 30minutes.
>        After several hours, some of the HVM will crash. All of the crash HVM
> are stopped during booting.
>        The bug still exists even I forbid page sharing by cheating tapdisk
> that xc_memshr_nominate_gref()
>        return failure.
>
>        And no special log found.
>
>        I was able to dump the crash stack.
>        what could happen?
>        thanks.
>
> PID: 2307   TASK: ffff810014166100  CPU: 0   COMMAND: "setfont"
>  #0 [ffff8100123cd900] xen_panic_event at ffffffff88001d28
>  #1 [ffff8100123cd920] notifier_call_chain at ffffffff80066eaa
>  #2 [ffff8100123cd940] panic at ffffffff8009094a
>  #3 [ffff8100123cda30] oops_end at ffffffff80064fca
>  #4 [ffff8100123cda40] do_page_fault at ffffffff80066dc0
>  #5 [ffff8100123cdb30] error_exit at ffffffff8005dde9
>     [exception RIP: vgacon_do_font_op+363]
>     RIP: ffffffff800515e5  RSP: ffff8100123cdbe 8  RFLAGS: 00010203
>     RAX: 0000000000000000  RBX: ffffffff804b3740  RCX: ffff8100000a03fc
>     RDX: 00000000000003fd  RSI: ffff810011cec000  RDI: ffffffff803244c4
>     RBP: ffff810011cec000   R8: d0d6999996000000   R9: 0000009090b0b0ff
>     R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004
>     R13: 0000000000000001  R14: 0000000000000001  R15: 000000000000000e
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff8100123cdc20] vgacon_font_set at ffffffff8016bec5
>  #7 [ffff8100123cdc60] con_font_op at ffffffff801aa86b
>  #8&nbsp ;[ffff8100123cdcd0] vt_ioctl at ffffffff801a5af4
>  #9 [ffff8100123cdd70] tty_ioctl at ffffffff80038a2c
> #10 [ffff8100123cdeb0] do_ioctl at ffffffff800420d9
> #11 [ffff8100123cded0] vfs_ioctl at ffffffff800302ce
> #12 [ffff8100123cdf40] sys_ioctl at ffffffff8004c766
> #13 [ffff8100123cdf80] tracesys at ffffffff8005d28d (via system_call)
>     RIP: 00000039294cc557  RSP: 00007fff54c4aec8  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
>     RDX: 00007fff54c4aee0  RSI: 0000000000004b72  RDI: 0000000000000003
>     RBP: 000000001d747ab0   R8: 0000000000000010   R9: 0000000 000800000
>     R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000010
>     R13: 0000000000000200  R14: 0000000000000008  R15: 0000000000000008
>     ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
>
>> Date: Fri, 21 Jan 2011 14:45:14 -0500
>> Subject: Re: mem_sharing: summarized problems when domain is dying
>> From: juihaochiang@xxxxxxxxx
>> To: Tim.Deegan@xxxxxxxxxx
>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>>
>> Hi
>>
>> On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang <juihaochiang@xxxxxxxxx>
>> wrote:
>> > Hi, Tim:
>> >
>> > From tinnycloud's result, here I summarize the current problem and
>> > findings of mem_sharing due to domain dying.
>> > (1) When domain is dying, alloc_domheap_page() and
>> > set_shared_p2m_entry() would just fail. So the shr_lock is not enough
>> > to ensure that the domain won't die in the middle of mem_sharing code.
>> > As tinnycloud's code shows, is that better to use
>> > rcu_lock_domain_by_id before calling the above two functions?
>> >
>>
>> There seems no good locking to protect a domain from changing the
>> is_dying state. So the unshare function could fail in the middle in
>> several points, e.g., alloc_domheap_page and set_shared_p2m_entry.
>> If that's the case, we need to add some checking, and probably revert
>> the things we have done when is_dying is changed in the middle.
>>
>> Any comments?
>>
>> Jui-Hao
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.