[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Xen intermittently fails to release HVM domU VBDs, preventing Heartbeat node fail-over


  • To: <xen-users@xxxxxxxxxxxxxxxxxxx>
  • From: "Steigerwald Erich" <steigerwald@xxxxxxxxxxxxxxxxxx>
  • Date: Mon, 30 Jun 2008 15:05:12 +0200
  • Delivery-date: Mon, 30 Jun 2008 06:05:50 -0700
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: Acjase6LhLanwcy9SnSwTXRlyucZ5w==
  • Thread-topic: Xen intermittently fails to release HVM domU VBDs, preventing Heartbeat node fail-over

Hi,

Intermittently, upon domU shutdown, Xen appears to fail to release domU
VBD handles. Consequentially, LVs in the VG remain open, and the VG
cannot be disabled. This effectively prevents manual failover.

We suspect a bug in Xen or dm-qemu.

Regards,
Erich

System environment information:
- SLES 10 SP2 (x86_64)
- Kernel 2.6.16.60-0.21-xen
- Xen 3.2.0_16718_14-0.4
- LVM 2.02.17-7.19
- Heartbeat 2.1.3-0.9

Configuration details:
- Xen HVM domU
- Xen VBD backed by LVM LV in dom0
- Xen resources and LVM VG managed by Heartbeat
- On node failover, Heartbeat stops domU, deactivates LVM VG, activates
VG on peer, starts domU on peer

Relevant error messages concurrent with the issue:

Xend log (note error occurring during domain_destroy()):
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1965)
XendDomainInfo.destroyDomain(21)
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1965)
XendDomainInfo.destroyDomain(24)
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1588) Removing vif/0
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:590)
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1588) Removing
vbd/51712
[2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:590)
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712
[2008-06-20 13:55:33 16615] ERROR (XendDomainInfo:1977)
XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 1972, in destroyDomain
    xc.domain_destroy(self.domid)
Error: (3, 'No such process')
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing
vbd/51728
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590)
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51728
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1575) Destroying
device model
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing vkbd/0
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590)
XendDomainInfo.destroyDevice: deviceClass = vkbd, device = vkbd/0
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing vfb/0
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590)
XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2008-06-20 13:55:33 16615] INFO (XendDomainInfo:1295) Domain has
shutdown: name=hostemplate id=23 reason=poweroff.
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1582) Releasing
devices
[2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing
console/0

Xend debug log:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/server/Hald.py",
line 55, in shutdown
    os.kill(self.pid, signal.SIGINT)
OSError: [Errno 3] No such process

Heartbeat debug log (note domain having terminated successfully from
Heartbeat's view, but subsequent VG deactivation failure):
Jun 20 13:55:34 xenha01 lrmd: [13229]: info: RA output:
(res_xen_xen-ad:stop:stdout) Domain xen-ad terminated All domains
terminated
- and subsequently
Jun 20 13:55:38 xenha01 tengine: [10035]: info: send_rsc_command:
Initiating action 17: res_lvm_xendomains01_stop_0 on xenha01
Jun 20 13:55:38 xenha01 crmd: [13232]: info: do_lrm_rsc_op: Performing
op=res_lvm_xendomains01_stop_0
key=17:13:78f873ed-af08-4add-bb36-3798cd1a4a22)
Jun 20 13:55:38 xenha01 lrmd: [13229]: info: rsc:res_lvm_xendomains01:
stop
Jun 20 13:55:38 xenha01 crmd: [13232]: info: process_lrm_event: LRM
operation res_lvm_xendomains01_stop_0 (call=190, rc=1) complete
Jun 20 13:55:38 xenha01 tengine: [10035]: WARN: update_failcount:
Updating failcount for res_lvm_xendomains01 on
519aa0b0-a947-47e9-ace9-d52030ef98a9 after failed stop: rc=1
Jun 20 13:55:38 xenha01 tengine: [10035]: info: match_graph_event:
Action res_lvm_xendomains01_stop_0 (17) confirmed on xenha01 (rc=4)
Jun 20 13:55:38 xenha01 pengine: [10036]: ERROR: unpack_rsc_op:
Remapping res_lvm_xendomains01_stop_0 (rc=1) on xenha01 to an ERROR
Jun 20 13:55:38 xenha01 pengine: [10036]: WARN: unpack_rsc_op:
Processing failed op res_lvm_xendomains01_stop_0 on xenha01: Error

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.