[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Re: domU corrupt after server crash, help needed trying to recover domU


  • To: Rudi Ahlers <rudiahlers@xxxxxxxxx>
  • From: Ciro Iriarte <cyruspy@xxxxxxxxx>
  • Date: Sun, 10 May 2009 15:08:27 -0400
  • Cc: "Fajar A. Nugraha" <fajar@xxxxxxxxx>, xen-users <xen-users@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Sun, 10 May 2009 12:09:19 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Cnl1O/uw+W9Fkn4g6zYXtpCWtq5c78d+HYMa+sSna0Vz+oRMh61OHLLvwbNAUso31Y PKt/5HJKh22zepQdxv/xmjDPdniXTZgjsSf8l8/rOoeZBYrzSdMH276OuzkqSSip9OGh NADkGhDgrCUsRGlIIQWvvA+XFeDw+Y23HRodk=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

2009/5/10 Rudi Ahlers <rudiahlers@xxxxxxxxx>:
>
>
> On Sun, May 10, 2009 at 9:38 AM, Rudi Ahlers <rudiahlers@xxxxxxxxx> wrote:
>>
>>
>> On Sat, May 9, 2009 at 5:19 AM, Fajar A. Nugraha <fajar@xxxxxxxxx> wrote:
>>>
>>> On Sat, May 9, 2009 at 5:42 AM, Rudi Ahlers <rudiahlers@xxxxxxxxx> wrote:
>>> > Hi Fajar,
>>> >
>>> > I got the commands via google search, so I didn't know that losetup was
>>> > only
>>> > meant for file-backed storage.
>>>
>>> If it's a block device (LVM, partition, etc.) you can skip losetup and
>>> go directly to
>>> kpartx -av Â/dev/data/hfserver2
>>
>> Really? Cool, now I've learned something :)
>>
>>>
>>> > Unfortunately there's no backups :(
>>>
>>> Ouch. Sorry to hear that.
>>> So that makes it what ... your second corruption?
>>> On my environment FS corruption is USUALLY because one of these :
>>> - human error (like the admin mounting the same block device twice on
>>> different servers). This usually happens on shared-storage systems
>>> (SAN, NAS, etc).
>>> - SAN error (like when it got temporarily disconnected, and then
>>> reconnected again)
>>> - server hardware error (bad memory, bad disk controller, etc.)
>>
>> Yes, but on a different server, different client, different reason. The
>> only thing that's the same is the IDC, and the server setup. Both have
>> CentOS on the host node, and runs cPanel on the domU VPS's. I'd love to
>> setup a shared NAS and have 2 servers shared the data from there, but funds
>> are a bit limited :(
>>
>>>
>>> I suggest you check all three to make sure corruption doesn't happen
>>> again. If both corruption are on the same hardware, then most likely
>>> the server hardware is bad.
>>
>> The problem is due to the RAM. The ECC (non buffered) Kingston Memory
>> modules don't work as expected on the Dell PE860 platform. Strangely when I
>> put normal desktop RAM into the server, it worked fine. So, I'm taking the
>> RAM back to the supplier on Monday.
>>
>>>
>>> You could PROBABLY still salvage some data from the broken domU. Try
>>> shutting it down, and mount it again on dom0. Sometimes fsck will find
>>> recovered inodes in /lost+found, so perhaps some of your data is still
>>> there.
>>
>> Yes, I'm going to try this and see how far I can get.
>>
>>>
>>> BTW, VolGroup00 IS the name of domU's VG right? It's not dom0's VG?
>>> Cause if it were dom0's you might have more problems ahead.
>>
>> no, the hostnode's LVM has been renamed to /dev/data/root, /dev/home/swap
>> & /dev/data/home for this very reason
>>
>>>
>>> Regards,
>>>
>>> Fajar
>>
>
>
> Just as matter of interest, the amount of recovered files in
> /mnt/cpanel/lost+found/ (this is the mounted VolGroup00 partition) is 32536
>
> And the files all looks like this:
>
> -rw-r----- 1 rootÂÂÂÂÂÂ 32046ÂÂ 94816 Feb 12 10:28 #1865704
> -rw-r--r-- 1 rootÂÂÂÂ rootÂÂÂÂÂÂÂÂ 91 Feb 12 10:28 #1865705
> -rw-r----- 1 rootÂÂÂÂÂÂ 32052ÂÂ 94816 Feb 12 10:28 #1865707
> -rw-r----- 1 rootÂÂÂÂÂÂ 32022ÂÂÂÂ 901 Feb 12 10:28 #1865709
> -rw-r----- 1 rootÂÂÂÂÂÂ 32052ÂÂ 94816 Feb 12 10:28 #1865710
> -rw-r----- 1 rootÂÂÂÂÂÂ 32013ÂÂ 94816 Feb 12 10:28 #1865711
> -rw-r----- 1 rootÂÂÂÂÂÂ 32013ÂÂ 94816 Feb 12 10:28 #1865713
> -rw-r--r-- 1 rootÂÂÂÂ rootÂÂÂÂÂÂÂÂ 91 Feb 12 10:28 #1865714
> -rw-r----- 1 rootÂÂÂÂÂÂ 32037ÂÂ 94816 Feb 12 10:28 #1865715
> -rw-r----- 1 rootÂÂÂÂÂÂÂÂ 506ÂÂ 94816 Feb 12 10:44 #1865716
> -rw-r----- 1 rootÂÂÂÂÂÂ 32037ÂÂÂ 6202 Feb 12 10:28 #1865717
> -rw-r--r-- 1 rootÂÂÂÂ rootÂÂÂÂÂÂÂÂ 92 Feb 12 10:44 #1865718
> -rw-r--r-- 1 rootÂÂÂÂ rootÂÂÂÂÂÂÂ 105 Feb 12 10:44 #1865719
> -rw-r----- 1 rootÂÂÂÂÂÂ 32029ÂÂ 94816 Feb 12 10:44 #1865721
> -rw-r----- 1 rootÂÂÂÂÂÂ 32029ÂÂ 94816 Feb 12 10:44 #1865722
> -rw-r----- 1 rootÂÂÂÂÂÂ 32029ÂÂ 94816 Feb 12 10:44 #1865723
> -rw-r-0m#1865794
> -rw-r----- 1 rootÂÂÂÂÂÂ 32029ÂÂ 94816 Feb 12 16:47 #1865795
> [root@xen cpanel]# ll lost+found/ | wc -l
>
>
>
> What can I do with these files, apart from deleting them?
>
> --
> Kind Regards
> Rudi Ahlers
> CEO, SoftDux Hosting
> Web: http://www.SoftDux.com
> Office: 087 805 9573
> Cell: 082 554 7532
>

Those are from a fsck execution. Your only hope is running the "file"
command on each of them to try to guess what are the files, if they
are not corrupt you can copy them back to where they belong, but
there's no automatic procedure for this...

Regards,


-- 
Ciro Iriarte
http://cyruspy.wordpress.com
--

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.