[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-API] XCP 1.5 lv cleanup not happening


  • To: "xen-api@xxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxx>
  • From: Ryan Farrington <rfarrington@xxxxxxxxxxxxx>
  • Date: Wed, 14 Nov 2012 16:08:11 -0600
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Delivery-date: Wed, 14 Nov 2012 22:08:32 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>
  • Thread-index: Ac3CsMgSDhTd0c+dTsmkVx0YOVt+lg==
  • Thread-topic: XCP 1.5 lv cleanup not happening

A special thanks goes out to felipef for all the help today.

 

History:

                (4) host pool – one in a failed state due to hardware failure

                (1) 3.2T data lun – SR-UUID = aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

 

The issue:

The 3.2T datalun was presenting as 91% utilized and only 33% virtually allocated.

                               

Work log:

 

Results were confirmed via the XC GUI and via the command line as identified below

                xe sr-list params=all uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

                                physical-utilisation ( RO): 3170843492352

                                physical-size ( RO): 3457918435328

                                virtual size: 1316940152832

type ( RO): lvmohba

sm-config (MRO): allocation: thick; use_vhd: true

 

Further digging found that summing all the vdis on the SR resulted in the virtual allocation number

                Commands + results:

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=physical-utilisation --minimal | sed 's/,/ + /g' | bc –l

physical utilization:  1,210,564,214,784

xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a params=virtual-size --minimal | sed 's/,/ + /g' | bc –l

                virtual size: 1,316,940,152,832

 

At this point we started looking at the VG to see if there were some LVs that were taking space but not known by the xapi

                Command + result:

                                vgs

                                                VG                                                                                                              #PV #LV #SN Attr   VSize    VFree

VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a   1  33   0 wz--n-    3.14T 267.36G

 

(lvs --units B | grep aa15042e | while read vg lv flags size; do echo -n "$size +" | sed 's/B//g'; done; echo 0)| bc -l

                                                3170843492352

 

So at this point we have confirmed that there are in fact lvs not accounted for by xapi. So we look for them

lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [ "$(xe vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done

                This returned a long list of UUIDs that did not have a matching entry in xapi

 

Grabbing one of the UUIDs at random and searching back in the xensource.log we find something strange

                [20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388 inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT, c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with params [3, 0, 512, 512]. Error: Input/output error];  ]

 

After a little googling around and finally finding a thread on the citrix forums (http://forums.citrix.com/thread.jspa?threadID=299275) that pointed me at a process to rebuild the metadata for that specific SR without having to blow away the SR and start fresh.

                Commands

lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT

xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a

 

This got rid of the SR_backend errors but the LVs continued to persist.  Started looking in the SMlog started seeing lines that pointed at the pool not being ready and exiting

                <25168> 2012-11-14 12:27:24.195463      Pool is not ready, exiting

 

At this point I manually forced the offline node out of the pool and the SMlog reported a success in the purge process.

                xe host-forget uuid=<down host>

 

_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.