[Xen-devel] xen zombie while starting on "Secondary" DRBD device

Hi all,

i am implementing a HA infrastructure composed of 2 phisical server and
8 virtual server with XEN.

Xen 3.0.2
DRBD 0.7 (debian)

Data redundancy and replication is handled by DRBD.

I used this howto:

I am stacking several technology:
XEN over DRBD over LVM over RAID1(software)

The HA is handled by heartbeat.
I found a very bad race condition that, when appear, require a hard
reboot of the machine (standard reboot doesn't work and hang).

Basically if a XEN server start when the corresponding DRBD device is in
Secondary state it became a zombie and it's not possible to remove it.
It's not even possible to reboot the server because xendomains stop hang
the reboot process.

Doing an "xm list" show this:
Zombie-admin-server0           15       32     2 ----cd     0.4

Zombie xen server are a very bad problem in this kind of infrastructure.

It's a problem of XEN, it's a problem of DRBD or of both?

