[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crashing when killing a domain with no VCPUs allocated



On 07/21/2014 11:42 AM, Andrew Cooper wrote:
On 21/07/14 11:33, George Dunlap wrote:
On 07/18/2014 09:26 PM, Julien Grall wrote:
On 18/07/14 17:39, Ian Campbell wrote:
On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
Hi all,

I've been played with the function alloc_vcpu on ARM. And I hit one
case
where this function can failed.

During domain creation, the toolstack will call DOMCTL_max_vcpus
which may
fail, for instance because alloc_vcpu didn't succeed. In this case,
the
toolstack will call DOMCTL_domaindestroy. And I got the below stack
trace.

It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by
returning
in an error in vcpu_initialize.

I'm not sure how to correctly fix it.
I think a simple check at the head of the function would be ok.

Alternatively perhaps in sched_mode_domain, which could either detect
this or could detect a domain in pool0 being moved to pool0 and short
circuit.
I was thinking about the small fix below. If it's fine for everyone,
I can
send a patch next week.

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e9eb0bc..c44d047 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
cpupool *c)
       }
         /* Do we have vcpus already? If not, no need to update
node-affinity */
-    if ( d->vcpu )
+    if ( d->vcpu && d->vcpu[0] != NULL )
           domain_update_node_affinity(d);
So is the problem that we're allocating the vcpu array area, but not
putting any vcpus in it?
The problem (as I recall) was that domain_create() got midway through
and alloc_vcpu(0) failed with -ENOMEM.  Following that failure, the
toolstack called domain_destroy().

Having d->vcpu properly allocated and containing fully NULL pointers is
a valid position to be in, especial in error or teardown paths.

Overall it seems like those checks for the existence of cpus should be
moved into domain_update_node_affinity().  The ASSERT() there I think
is just a sanity check to make sure we're not getting a ridiculous
result out of our calculation; but of course if there actually are no
vcpus, it's not ridiculous at all.

One solution might be to change the ASSERT to
ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then
we could probably even remove the d->vcpu conditional when calling it.
If you were going along this line, the pointer checks are substantially
less expensive than cpumask_empty(), so the ||'s should be reordered.
However, I am not convinced that it is necessarily the best solution,
given my previous observation.

Er, I was with you until the last part. What's wrong with changing the assert from "Make sure I have *something* in there" to "Make sure I have *something* in there *if I have any vcpus*"? That seems to be accepting that having d->vcpu allocated but full of null pointers is a valid condition.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.