[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Xen Crashes when releasing gnttab mappings - of a crashed domain.


  • To: "xen-devel" <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Moffie, Micha" <micha.moffie@xxxxxx>
  • Date: Tue, 21 Nov 2006 14:09:35 -0000
  • Delivery-date: Tue, 21 Nov 2006 06:09:56 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AccNc31su++dBHlmEduJYgAX8io7RQAAhe0A
  • Thread-topic: Xen Crashes when releasing gnttab mappings - of a crashed domain.

Observation:
------------
When connecting two miniOs (using a shared ring), Xen (not a domain)
crashes when the miniOs's exits..

Xen crashes and produces the following: 
(XEN) Xen call trace:
(XEN)    [<ff11d20d>] __bug+0x29/0x45
(XEN)    [<ff107cb3>] gnttab_release_mappings+0xcb/0x2e5
(XEN)    [<ff1046dd>] domain_kill+0x29/0x62
(XEN)    [<ff10349a>] do_domctl+0x6d6/0xfbc
(XEN)    [<ff165755>] hypercall+0x95/0xb5
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) BUG at grant_table.c:1122
(XEN) ****************************************


The cause:
----------
Xen tries to release the grant table mappings by accessing a remote
domain grant table. 
But the remote domain seems to be non-existent and consequently Xen
fails:
find_domain_by_id (in gnttab_release_mappings) returns NULL.


Analysis:
---------
This situation described above should never happen: if I understand
correctly, a domain should not be completely destroyed until there are
no more references to it.
See: put_domain(d) // sched.h
Which is defined as follows:
If ( atomic_dec_and_test( &(_d_->refcnt) ) domain_destroy(_d)

It does however happen when a domain crashes.

Note that there are two ways to "finish" with a domain (domain.c):
1.      domain_kill (which calls domain_destroy) - releases all
resources in a gracefull 
      manner.
2.      __domain_crash (which calls domain_shutdown) - which seems to
kill the domain 
      without proper releasing of resources that reference to it.. 
     (this function is called on extreme cases)


Our scenario:
-------------

We are running two miniOs with the same profile:
Open a ring (share a page with a grant ref and map a page from a remote
domain)
Write
Read
Close the ring (dealloc, unmap*)
do_exit()



Timeline - > 
MiniOs 1:  ..........         calls do_exit() -> 
                                     .. domain_kill() -> 
                                            .. gnttab_release_mapping()
-> 
                                                    .. BUG()

MiniOs 2:    crashes**

                 
*When we unmap we use Xen's hypercall to unmap a grant reference 
and the gnttab_unmap_grant_ref structure.
Note that we have a bug and do NOT set unmap_op.dev_bus_addr to 0 as we
should.
Xen's API (in public/grant_table.h) explicitly describes that it should
be 0 or 
the grant reference will be treated as valid device mapping. 

** Because of the bug descrived in * we cause the domain to crash.
We observe:
(XEN) grant_table.c:394: Bad frame number doesn't match gntref
(XEN) mm.c:760: Attempt to implicitly unmap a granted PTE 
(XEN) domain_crash called from mm.c:761 



Summary:
-----------

1. Setting unmap_op.dev_bus_addr removes the BUG and all is well.
2. But crashing Xen - even with our error - doesn't seem to be a healthy
choice.



:) 
Micha.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.