[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.7 crash



On 6/7/2016 9:40 AM, Aaron Cornelius wrote:
On 6/7/2016 5:53 AM, Ian Jackson wrote:
Aaron Cornelius writes ("Re: [Xen-devel] Xen 4.7 crash"):
We realized that we had forgotten to remove the domain from the
permissions list when the domain is deleted (which would cause the error
we saw).  The application was updated to remove the domain from the
permissions list:
1. retrieve the permissions with xs_get_permissions()
2. find the domain ID that is being deleted
3. memmove() the remaining domains down by 1 to "delete" the old domain
from the permissions list
4. update the permissions with xs_set_permissions()

After we made that change, a load test over the weekend confirmed that
the Xen crash no longer happens.  We checked this morning first thing
and confirmed that without this change the crash reliably occurs.

This is rather odd behaviour.  I don't think xenstored should hang
onto the domain's xs ring page just because the domain is still
mentioned in a permission list.

But it may do.  I haven't checked the code.  Are you using the
ocaml xenstored (oxenstored) or the C one ?

I didn't remember specifying anything special when building the xen
tools, but I did run into troubles where the ocaml tools appeared to
conflict with the opam installed mirage packages and libraries. Running
"sudo make dist-install" command installs the ocaml libraries as root
which made using opam difficult.  So I did disable the ocaml tools
during my build.

I double checked and confirmed that the C version of xenstored was
built.  We will try to test the failure scenario with oxenstored to see
if it behaves any differently.

I am not that familiar with the xenstored code, but as far as I can tell the grant mapping will be held by the xenstore until the xs_release() function is called (which is not called by libxl, and I do not explicitly call it in my software, although I might now just to be safe), or until the last reference to a domain is released and the registered destructor (destroy_domain), set by talloc_set_destructor(), is called.

I tried to follow the oxenstored code, but I certainly don't consider myself an expert at OCaml. The oxenstored code does not appear to allocate grant mappings at all, which makes me think I am probably misunderstanding the code :)

- Aaron

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.