[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] XCP 1.5 lv cleanup not happening


  • To: xen-api@xxxxxxxxxxxxx
  • From: George Shuklin <george.shuklin@xxxxxxxxx>
  • Date: Thu, 15 Nov 2012 02:41:19 +0400
  • Delivery-date: Wed, 14 Nov 2012 22:41:14 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

Yep, XCP 1.1 requirer all hosts to be online to purge VDI's from SR (LVM or NFS, does not matter).

Strangely, XCP 0.5 had no that kind of restriction.


On 15.11.2012 02:08, Ryan Farrington wrote:

A special thanks goes out to felipef for all the help today.

Â

History:

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (4) host pool â one in a failed state due to hardware failure

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (1) 3.2T data lun â SR-UUID = aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

Â

The issue:

The 3.2T datalun was presenting as 91% utilized and only 33% virtually allocated.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ

Work log:

Â

Results were confirmed via the XC GUI and via the command line as identified below

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ xe sr-list params=all uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ physical-utilisation ( RO): 3170843492352

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ physical-size ( RO): 3457918435328

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ virtual size: 1316940152832

type ( RO): lvmohba

sm-config (MRO): allocation: thick; use_vhd: true

Â

Further digging found that summing all the vdis on the SR resulted in the virtual allocation number

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Commands + results:

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=physical-utilisation --minimal | sed 's/,/ + /g' | bc âl

physical utilization:Â 1,210,564,214,784

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=virtual-size --minimal | sed 's/,/ + /g' | bc âl

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ virtual size: 1,316,940,152,832

Â

At this point we started looking at the VG to see if there were some LVs that were taking space but not known by the xapi

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Command + result:

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vgs

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ VGÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ÂÂÂÂ#PV #LV #SN AttrÂÂ VSizeÂÂÂ VFree

VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56aÂÂ 1Â 33ÂÂ 0 wz--n-ÂÂÂ 3.14T 267.36G

Â

(lvs --units B | grep aa15042e | while read vg lv flags size; do echo -n "$size +" | sed 's/B//g'; done; echo 0)| bc -l

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 3170843492352

Â

So at this point we have confirmed that there are in fact lvs not accounted for by xapi. So we look for them

lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [ "$(xe vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ This returned a long list of UUIDs that did not have a matching entry in xapi

Â

Grabbing one of the UUIDs at random and searching back in the xensource.log we find something strange

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ [20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388 inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT, c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error];Â ]

Â

After a little googling around and finally finding a thread on the citrix forums (http://forums.citrix.com/thread.jspa?threadID=299275) that pointed me at a process to rebuild the metadata for that specific SR without having to blow away the SR and start fresh.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Commands

lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT

xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

Â

This got rid of the SR_backend errors but the LVs continued to persist. Started looking in the SMlog started seeing lines that pointed at the pool not being ready and exiting

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ <25168> 2012-11-14 12:27:24.195463ÂÂÂÂÂ Pool is not ready, exiting

Â

At this point I manually forced the offline node out of the pool and the SMlog reported a success in the purge process.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ xe host-forget uuid=<down host>

Â



_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.