Xen project Mailing List

A special thanks goes out to felipef for all the help today.

History:

(4) host pool – one in a failed state due to hardware failure

(1) 3.2T data lun – SR-UUID = aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

The issue:

The 3.2T datalun was presenting as 91% utilized and only 33% virtually allocated.

Work log:

Results were confirmed via the XC GUI and via the command line as identified below

xe sr-list params=all uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

physical-utilisation ( RO): 3170843492352

physical-size ( RO): 3457918435328

virtual size: 1316940152832

type ( RO): lvmohba

sm-config (MRO): allocation: thick; use_vhd: true

Further digging found that summing all the vdis on the SR resulted in the virtual allocation number

Commands + results:

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=physical-utilisation --minimal | sed 's/,/ + /g' | bc –l

physical utilization: 1,210,564,214,784

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=virtual-size --minimal | sed 's/,/ + /g' | bc –l

virtual size: 1,316,940,152,832

At this point we started looking at the VG to see if there were some LVs that were taking space but not known by the xapi

Command + result:

vgs

VG #PV #LV #SN Attr VSize VFree

VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a 1 33 0 wz--n- 3.14T 267.36G

(lvs --units B | grep aa15042e | while read vg lv flags size; do echo -n "$size +" | sed 's/B//g'; done; echo 0)| bc -l

3170843492352

So at this point we have confirmed that there are in fact lvs not accounted for by xapi. So we look for them

lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [ "$(xe vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done

This returned a long list of UUIDs that did not have a matching entry in xapi

Grabbing one of the UUIDs at random and searching back in the xensource.log we find something strange

[20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388 inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT, c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error]; ]

After a little googling around and finally finding a thread on the citrix forums (http://forums.citrix.com/thread.jspa?threadID=299275) that pointed me at a process to rebuild the metadata for that specific SR without having to blow away the SR and start fresh.

Commands

lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT

xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

This got rid of the SR_backend errors but the LVs continued to persist. Started looking in the SMlog started seeing lines that pointed at the pool not being ready and exiting

<25168> 2012-11-14 12:27:24.195463 Pool is not ready, exiting

At this point I manually forced the offline node out of the pool and the SMlog reported a success in the purge process.

xe host-forget uuid=<down host>

[Xen-API] XCP 1.5 lv cleanup not happening