Xen project Mailing List

[Xen-users] [XCP] ext3 crashes and slowdowns

From: Christian Fischer <christian.fischer@xxxxxxxxxxxxxxxxxxx>

Date: Tue, 4 Jan 2011 15:37:36 +0100

Delivery-date: Tue, 04 Jan 2011 06:38:56 -0800

List-id: Xen user discussion <xen-users.lists.xensource.com>

Hi Folks. I've two Intel boxes (Intel server S5520UR, 2x E5520, 32GB ram, SATA HW-Raid, BBU) running as XCP-0.5 pool, both running a OpenFiler-2.3 domU, clustered, active/passive. Data Storage is provided as SCSISR (without LVM layer, like a HBASR) to OpenFiler. Shared storage is provided as iSCSI target by OpenFiler via clusterIP (storage frontend network), replication is done by drbd (storage backend network), HA is done by haertbeat (hearbeat network). All networks are built on top of redundant HP gigabit switches, 2 pairs of Intel gigabit NICs, each bonded and plugged into the same switch, both bonds multipathed (active/passive multipathing, patched OpenVSwitch-1.1.2p1) via the two switches, which are linked together with 2 ports each. XCP pool works, ISCSI works, replication works, HA works. If filer 1 (running on server1) is active i can install and run domUs on server 2 without problems, I can not install or run domUs on server 1. If I switch to filer 2 (on server 2) as the active one the running but stalled domUs on server 1 get back their life, and the running domUs on filer2 loose their life. # dd if=/dev/zero of=/tmp/test bs=512M count=1 oflag=direct shows a rate of 0.8 - 1.2 MB/sec. The kernel shows traces like INFO: task syslogd:1081 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syslogd D ffff880001003460 0 1081 1 1084 1073 (NOTLB) ffff8800367edd88 0000000000000286 ffff8800367edd98 ffffffff80262dd3 0000000000000009 ffff88003fb007a0 ffffffff804f4b80 0000000000000d5b ffff88003fb00988 0000000000006d06 Call Trace: [<ffffffff80262dd3>] thread_return+0x6c/0x113 [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328 [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291 [<ffffffff802e555b>] sync_inode+0x24/0x33 [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc [<ffffffff80252276>] do_fsync+0x52/0xa4 [<ffffffff802d37f5>] __do_fsync+0x23/0x36 [<ffffffff802602f9>] tracesys+0xab/0xb6 Iscsiadm shows no errors. # iscsiadm -m session -r 1 -s Stats for session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.26336ef50fe0:storage1_osimages, portal: 172.16.0.2,3260] iSCSI SNMP: txdata_octets: 486181549212 rxdata_octets: 2622687792 noptx_pdus: 0 scsicmd_pdus: 15184105 tmfcmd_pdus: 0 login_pdus: 0 text_pdus: 0 dataout_pdus: 195910 logout_pdus: 0 snack_pdus: 0 noprx_pdus: 0 scsirsp_pdus: 15184088 tmfrsp_pdus: 0 textrsp_pdus: 0 datain_pdus: 87898 logoutrsp_pdus: 0 r2t_pdus: 151200 async_pdus: 0 rjt_pdus: 0 digest_err: 0 timeout_err: 0 iSCSI Extended: tx_sendpage_failures: 0 rx_discontiguous_hdr: 0 eh_abort_cnt: 0 If I reboot the domU after giving back her life, in most cases, the ext3 journal is corrupt, and the kernel panics after one reboot more. If I try to install a PV-Domain (CentOS-5.5) the installer asks if I wish to initialize the disk xvda, but if the disk partitioning and layout questions appear the disk is missing in the list. There's nothing more than a question mark. Sometimes I have the disk in the list, if so I can install the OS, all seems fine, but after the second reboot the ext3 journal is missing and the kernel panics after the third reboot, rootfs is gone. Are there any ideas? I'm out of. Thanks Christian Some kernel logging from domU, nothing inside dom0 log. EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743295 Aborting journal on device dm-0. ext3_abort called. EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743296 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743297 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743298 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743299 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743300 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743301 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743302 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743303 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743304 EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 743305 EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has aborted EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.