[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Zombie VMs cannot be destroyed




On 1-Dec-06, at 1:25 PM, Tim Post wrote:

On Fri, 2006-12-01 at 12:57 -0500, Michael Froh wrote:
This sounds like a B rate horror movie.  Has anyone else seen Zombie
VMs?

Yes. I get them quite often due to the fact that I use mostly Tyan
boards with on board SODIMMs for disk caching. Guests like to hang on
shutdown due to that. Only seems to be on Tyan boards.

I am using a Dell PowerEdge 2900. Don't know the motherboard used by Dell.


I had a number of VMs running and used the following script to
destroy them:

for vm in `xm list | awk '{print $1}' | grep -v Name | grep -v
Domain-0`; do xm destroy $vm; done


I hope those aren't ext3 file systems. 'shutdown' would be preferable.

Understood. Right now I'm just playing with Xen so there is no data to be lost.

The domains which were correctly destroyed were centos domainU with
ext3 fs mounted.  These are snapshots of a pristine ext3 fs so will just
recreate the snapshots in my xen sandbox.

As noted in the list below, the remaining were dsl & knoppix domainU which only had their respective .iso images mounted ro so inode flushing should
not be a problem here.

This destroyed all of the para-virtualized domains running (4 of
them) but turned all the HVM VMs into Zombies as shown here:


Destroyed is the word. You may want to fsck prior to booting them again,
it would be faster.

[root@vm0 ~]# xm list
Name                                      ID Mem(MiB) VCPUs State
Time(s)
Domain-0                                   0     5074     4 r-----
4912.8
Zombie-dsl0                               25      256     1 -b---d
552.1
Zombie-dsl1                               26      256     1 -b---d
552.2
Zombie-dsl2                               27      256     1 -b---d
550.0
Zombie-dsl3                               28      256     1 -b---d
554.5
Zombie-knoppix0                           17      256     1 -b---d
4459.9
Zombie-knoppix1                           18      256     1 -----d
4425.9
Zombie-knoppix2                           19      256     1 -b---d
4530.9
Zombie-knoppix3                           20      256     1 -b---d
4493.7


Subsequent attempts to destroy the VMs using "xm destroy 25" or "xm
destroy Zombie-dsl0" don't do anything.


Zombie VM's are just like zombie processes.. they're waiting for
something to happen before they exit. In this case they're waiting for
disks to sync on a VBD that's no longer connected. In effect, you pulled
out the hard drives before the VM's could sync what they had in the
inode cache to write, then yanked the power cord and plugged it back in
really quickly.

Bad idea.

It's curious that the VMs are shown as booting and being destroyed (-
b----d).


Whats being destroyed are your file systems.

The para-virtualized VMs were named centos[0-3] so it might be a
timing issue where only 4 destroys were properly handled and the para-
virtualized VMs happened to be the first 4 domains in xm list.

I will play around a bit to see if I can recreate consistently and if
there options to really destroy the domains.  This is not an issue
for me since my VM environment is a lab, but in production this might
be very problematic.


Amen. Try "xm shutdown" .. if your script has to ensure a dom-u exited
try something like :

counter=0

while [ `xm list | grep [domname]` = 0 ] && [ "$counter" -le 20 ]; do
        xm shutdown [domname]
        sleep 5
        let "counter += 1"
done

if [ "$counter" -ge 20 ]; then
        xm pause [domname]
        xm sysrq [domname] S
        sleep 5
        xm destroy [domname]
fi

Depending on the I/O usage of the guests, you may want to toss in a xm
sysrq 0 S too.

Note, "xm shutdown [domname]" is almost always going to exit 0. The only reason it will not is if [domname] doesn't exist. It is a little tricky
to use in a script.

The above is completely off the top of my head and meant for
illustration only.

ext3 (or any other journaling file systems) get *very* grumpy if they
can't flush their inodes prior to shutting down. Save yourself a few
hassles :)

xm destroy = pull out the power cord.

You may try using "xendomains" instead.

Thanks for the draft script. I haven't gotten around to playing with xendomains
yet, but will.

I have since tried to recreate the problem but have been unable to after a system reboot. I have tried 24 running domains and all "destroy" properly,
so it seems it wasn't a timing issue.

Tim, when you do get zombie domains, how do you eventually purge them
since they seem to be using memory but no CPU.

Mike.


Hope this helps
-Tim

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.