[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] How (not) to destroy a PostgreSQL db in domU on powerfail



On Mittwoch 04 März 2009 Matthieu Patou wrote:
> Mike,
> It's quite strange, I am running xen with xfs and lvm since a couple
> of time and I had some server crash (not power cable failure but
> still). 

A crash is different from a power fail of course, as the disks don't 
loose power suddenly.

> It's well known that lvm do not honors barriers 

I thought that is a bug and fixed already? 

Anyway, I've even mounted XFS with "nobarrier", as the XFS FAQ 
recommends:
http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
(BTW: I've edited those FAQ after talking to the XFS devs, so I'm pretty 
sure that info is correct).

> (as ext3 as
> well by default until last year: http://lkml.org/lkml/2008/5/16/390)
> so it means that you are not completely sure that metadata are
> written before the real data are modified, they can still be in the
> cache and if it vanish (due to power outage on not battery backed
> controller) then you're on your own.

But exactly *where* should the data be lost?
1) XFS
2) LVM
3) XEN
4) RAID controller
5) Linux cache
6) Hard disks
7) ???

And how can I come to a secure solution? Should I use reiserfs again? 
Used that for years without a problem, but not with XEN though.

> But if you have a battery backed cache then it should be ok, that is
> to my understanding (I can be wrong).

But you must turn hard disk write cache off, which I have.

> It can be that your battery is dead or not working correctly (or your
> controller not using it ...), 

It's working - the host itself has had no problem whatsoever.

> the other option is that some metadata
> haven't leaved the os cache (read: the domU has written the
> information to the disk but either dom0 or the hypervisor is
> mainlining is caching data for backend device) and in this case it's
> normal to face problem.

Could be - but then there should be a workaroung.

> It would be very interesting to have more information from xfs team.

I posted there also, no solution until now.

I wonder why there's no documentation about this problem. There are 
people using XEN in production machines - are they not scared by the 
actual behaviour? Even if I have UPSes and whatever, a crash can always 
occur. I have a customer who wants to use XEN to replace 10 small 
servers by a single one, but currently I'm reluctant to recommend XEN 
because I worry about the data. Imagine you have 10 servers not coming 
up after a problem - it could take hours to get every single server up 
and running again.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.