[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] [XCP] ext3 crashes and slowdowns



On Wed, Jan 05, 2011 at 11:37:03AM +0100, Christian Fischer wrote:
> On Wednesday 05 January 2011 11:03:37 Pasi Kärkkäinen wrote:
> > On Tue, Jan 04, 2011 at 03:37:36PM +0100, Christian Fischer wrote:
> > > Hi Folks.
> > > 
> > > I've two Intel boxes (Intel server S5520UR, 2x E5520, 32GB ram, SATA
> > > HW-Raid, BBU) running as XCP-0.5 pool, both running a OpenFiler-2.3
> > > domU, clustered, active/passive. Data Storage is provided as SCSISR
> > > (without LVM layer, like a HBASR) to OpenFiler. Shared storage is
> > > provided as iSCSI target by OpenFiler via clusterIP (storage frontend
> > > network), replication is done by drbd (storage backend network), HA is
> > > done by haertbeat (hearbeat network). All networks are built on top of
> > > redundant HP gigabit switches, 2 pairs of Intel gigabit NICs, each
> > > bonded and plugged into the same switch, both bonds multipathed
> > > (active/passive multipathing, patched OpenVSwitch-1.1.2p1) via the two
> > > switches, which are linked together with 2 ports each.
> > 
> > Hello,
> > 
> > Did you try XCP 1.0 beta?
> 
> Hi Pasi,
> 
> No, not yet. But I'll try it. Is it more beta than 0.5, or less? Can it be 
> used as production system?
>

I *think* it should be better than 0.5 :) Also I *think* there's XCP 1.0 beta2
coming up soon(ish).

> Is it upgradable if 1.0 final comes out?

Not sure.

> 
> There are two possible ways to solve this, trying 1.0 beta, or using 
> dedicated 
> storage server hardware. The storage works perfect if I run the guest systems 
> on top of a third hardware.
> 
> What I don't understand is what badness happens if the active filer and the 
> guest running on top of the same hardware. I think the setup should work.
> I've seen this fs crashes also on top of glusterfs, which I've tried before, 
> with the difference that both servers was affected. That was an active/active 
> filer setup.
> 

-- Pasi


> Christian
> 
> > 
> > -- Pasi
> > 
> > > XCP pool works, ISCSI works, replication works, HA works.
> > > 
> > > If filer 1 (running on server1) is active i can install and run domUs on
> > > server 2 without problems, I can not install or run domUs on server 1.
> > > 
> > > If  I switch to filer 2 (on server 2) as the active one the running but
> > > stalled domUs on server 1 get back their life, and the running domUs on
> > > filer2 loose their life.
> > > # dd if=/dev/zero of=/tmp/test bs=512M count=1 oflag=direct
> > > shows a rate of  0.8 - 1.2 MB/sec.
> > > 
> > > The kernel shows traces like
> > > 
> > > INFO: task syslogd:1081 blocked for more than 120 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > syslogd       D ffff880001003460     0  1081      1          1084  1073
> > > (NOTLB)
> > > 
> > >  ffff8800367edd88  0000000000000286  ffff8800367edd98  ffffffff80262dd3
> > >  0000000000000009  ffff88003fb007a0  ffffffff804f4b80  0000000000000d5b
> > >  ffff88003fb00988  0000000000006d06
> > > 
> > > Call Trace:
> > >  [<ffffffff80262dd3>] thread_return+0x6c/0x113
> > >  [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
> > >  [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
> > >  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
> > >  [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
> > >  [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291
> > >  [<ffffffff802e555b>] sync_inode+0x24/0x33
> > >  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
> > >  [<ffffffff80252276>] do_fsync+0x52/0xa4
> > >  [<ffffffff802d37f5>] __do_fsync+0x23/0x36
> > >  [<ffffffff802602f9>] tracesys+0xab/0xb6
> > > 
> > > Iscsiadm shows no errors.
> > > 
> > > # iscsiadm -m session -r 1 -s
> > > Stats for session [sid: 1, target:
> > > iqn.2006-01.com.openfiler:tsn.26336ef50fe0:storage1_osimages, portal:
> > > 172.16.0.2,3260]
> > > 
> > > iSCSI SNMP:
> > >         txdata_octets: 486181549212
> > >         rxdata_octets: 2622687792
> > >         noptx_pdus: 0
> > >         scsicmd_pdus: 15184105
> > >         tmfcmd_pdus: 0
> > >         login_pdus: 0
> > >         text_pdus: 0
> > >         dataout_pdus: 195910
> > >         logout_pdus: 0
> > >         snack_pdus: 0
> > >         noprx_pdus: 0
> > >         scsirsp_pdus: 15184088
> > >         tmfrsp_pdus: 0
> > >         textrsp_pdus: 0
> > >         datain_pdus: 87898
> > >         logoutrsp_pdus: 0
> > >         r2t_pdus: 151200
> > >         async_pdus: 0
> > >         rjt_pdus: 0
> > >         digest_err: 0
> > >         timeout_err: 0
> > > 
> > > iSCSI Extended:
> > >         tx_sendpage_failures: 0
> > >         rx_discontiguous_hdr: 0
> > >         eh_abort_cnt: 0
> > > 
> > > If I reboot the domU after giving back her life, in most cases, the ext3
> > > journal is corrupt, and the kernel panics after one reboot more.
> > > 
> > > If I try to install a PV-Domain (CentOS-5.5) the installer asks if I wish
> > > to initialize the disk xvda, but if the disk partitioning and layout
> > > questions appear the disk is missing in the list. There's nothing more
> > > than a question mark.
> > > Sometimes I have the disk in the list, if so I can install the OS, all
> > > seems fine, but after the second reboot the ext3 journal is missing and
> > > the kernel panics after the third reboot, rootfs is gone.
> > > 
> > > 
> > > Are there any ideas? I'm out of.
> > > 
> > > Thanks
> > > Christian
> > > 
> > > Some kernel logging from domU, nothing inside dom0 log.
> > > 
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743295
> > > Aborting journal on device dm-0.
> > > ext3_abort called.
> > > EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted
> > > journal Remounting filesystem read-only
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743296
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743297
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743298
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743299
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743300
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743301
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743302
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743303
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743304
> > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for
> > > block 743305
> > > EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has
> > > aborted EXT3-fs error (device dm-0) in ext3_truncate: Journal has
> > > aborted EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal
> > > has aborted EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has
> > > aborted EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal
> > > has aborted __journal_remove_journal_head: freeing b_committed_data
> > > __journal_remove_journal_head: freeing b_committed_data
> > > __journal_remove_journal_head: freeing b_committed_data
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.