[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.7 crash



On 6/14/2016 9:26 AM, Aaron Cornelius wrote:
On 6/14/2016 9:15 AM, Wei Liu wrote:
On Tue, Jun 14, 2016 at 09:11:47AM -0400, Aaron Cornelius wrote:
On 6/9/2016 7:14 AM, Ian Jackson wrote:
Aaron Cornelius writes ("Re: [Xen-devel] Xen 4.7 crash"):
I am not that familiar with the xenstored code, but as far as I can tell
the grant mapping will be held by the xenstore until the xs_release()
function is called (which is not called by libxl, and I do not
explicitly call it in my software, although I might now just to be
safe), or until the last reference to a domain is released and the
registered destructor (destroy_domain), set by talloc_set_destructor(),
is called.

I'm not sure I follow.  Or maybe I disagree.  ISTM that:

The grant mapping is released by destroy_domain, which is called via
the talloc destructor as a result of talloc_free(domain->conn) in
domain_cleanup.  I don't see other references to domain->conn.

domain_cleanup calls talloc_free on domain->conn when it sees the
domain marked as dying in domain_cleanup.

So I still think that your acl reference ought not to keep the grant
mapping alive.

It took a while to complete the testing, but we've finished trying to
reproduce the error using oxenstored instead of the C xenstored.  When the
condition occurs that caused the error with the C xenstored (on
4.7.0-rc4/8478c9409a2c6726208e8dbc9f3e455b76725a33), oxenstored does not
cause the crash.

So for whatever reason, it would appear that the C xenstored does keep the
grant allocations open, but oxenstored does not.


Can you provide some easy to follow steps to reproduce this issue?

AFAICT your environment is very specialised, but we should be able to
trigger the issue with plan xenstore-* utilities?

I am not sure if the plain xenstore-* utilities will work, but here are
the steps to follow:

1. Create a non-standard xenstore path: /tool/test
2. Create a domU (mini-os/mirage/something small)
3. Add the new domU to the /tool/test permissions list (I'm not 100%
sure how to do this with the xenstore-* utilities)
    a. call xs_get_permissions()
    b. realloc() the permissions block to add the new domain
    c. call xs_set_permissions()
4. Delete the domU from step 2
5. Repeat steps 2-4

Eventually the xs_set_permissions() function will return an E2BIG error
because the list of domains has grown too large.  Sometime after that is
when the crash occurs with the C xenstored and the 4.7.0-rc4 version of
Xen.  It usually takes around 1200 or so iterations for the crash to occur.

After writing up those steps I suddenly realized that I think I have a bug in my test that might have been causing the bug in the first place. Once I get errors returned from xs_set_permissions() I was not properly cleaning up the created domains. So I think this was just a simple case of VMID exhaustion by creating more than 255 domUs at the same time.

In which case this is completely unrelated to xenstore holding on to grant allocations, and the C xenstore most likely behaves correctly.

- Aaron Cornelius


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.