[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Scary!!! Lost domU!!!


  • From: Jamon Camisso <jamonation@xxxxxxxxx>
  • Date: Sun, 03 Jan 2010 21:15:28 -0500
  • Cc: xen-users@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Sun, 03 Jan 2010 18:16:11 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:cc:subject:references :in-reply-to:content-type:content-transfer-encoding; b=iwo4GoK22DNJ74jmxYygLgA9/pl8LOXZoKYYQNZI55kmLPJTgDvkLZOx3ahMO7E4tL YOqCkYydzAePBnfga9KfOpbN6cLGH4xIGSZgvKnNKkdevV3eViHHSTAnlU0N8dwhfg0k hGLPZMbhkz8fvWrNpCKoarBkM9qTRLDoAiy2U=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

James Pifer wrote:
On Fri, 2010-01-01 at 14:07 -0500, Jamon Camisso wrote:
Is there more than just the sles server using both volumes? If not, have you considered using another filesystem? Personally I've had nothing but trouble with ocfs2 in Debian and Centos -- clusters would just randomly fall apart. I've also found that unless filesystem throughput is very good, ocfs2 would end up loosing writes by getting ahead of itself somehow. All depends on the storage backend I suppose.

I think I know what happened in this case. After a lot of thought, I
believe the blunder was mine. I remember working with this specific domU
in early December. I was moving it from my dev machine with local
storage to the cluster. I did not realize how much space it was actually
using, so after copying I decided it would best to leave it on local
storage since it was not a super critical system.
Here's when I'm speculating. Somewhere along the way I think I screwed
up and did bring the domU up on the ocfs2 cluster or I had already
modified the config. I then started it back up before deleting the one I
just copied. I then tried to delete the copy on ocfs2 while it was
running. Not sure why I may have stopped here when it did not delete,
maybe side tracked, don't know. In any case I'm thinking they were
marked for deletion.
Then after Christmas I had to reboot the server for a different problem.
When I stopped the domU, or during reboot, the file deletion actually
took place. Thankfully I still had a copy of it. Wouldn't have been the
end of the world except for work rebuilding it.
I'm not sure if that is even possible but that's what I'm thinking.
Other than that my ocfs2 cluster has been solid on sles. Been using it
for quite some time, well over a year I think.

That sounds plausible. I could see doing the same thing pretty easily. I use xm migrate (live) to make sure that there's only ever one copy of a domU running anywhere. That way I can definitively check from the dom0 which filesystem is being used too -- it must get messy with different storage pools, lvm volumes, raw tap:aio files etc.

The one doubt I have is the timeline involved. I suppose it is possible that the domU continued merrily along with a filesystem that was loosing writes for the rest of the month (a couple weeks?), it's too bad there isn't a copy of the filesystem around where you could see the logs to confirm it!

Good to hear you've got a backup and that you haven't had problems since the reboot :)

Jamon

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.