[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] [XCP] ext3 crashes and slowdowns
On Tue, Jan 04, 2011 at 03:37:36PM +0100, Christian Fischer wrote: > Hi Folks. > > I've two Intel boxes (Intel server S5520UR, 2x E5520, 32GB ram, SATA HW-Raid, > BBU) running as XCP-0.5 pool, both running a OpenFiler-2.3 domU, clustered, > active/passive. Data Storage is provided as SCSISR (without LVM layer, like a > HBASR) to OpenFiler. Shared storage is provided as iSCSI target by OpenFiler > via clusterIP (storage frontend network), replication is done by drbd > (storage > backend network), HA is done by haertbeat (hearbeat network). All networks > are > built on top of redundant HP gigabit switches, 2 pairs of Intel gigabit NICs, > each bonded and plugged into the same switch, both bonds multipathed > (active/passive multipathing, patched OpenVSwitch-1.1.2p1) via the two > switches, which are linked together with 2 ports each. > Hello, Did you try XCP 1.0 beta? -- Pasi > XCP pool works, ISCSI works, replication works, HA works. > > If filer 1 (running on server1) is active i can install and run domUs on > server 2 without problems, I can not install or run domUs on server 1. > > If I switch to filer 2 (on server 2) as the active one the running but > stalled domUs on server 1 get back their life, and the running domUs on > filer2 > loose their life. > # dd if=/dev/zero of=/tmp/test bs=512M count=1 oflag=direct > shows a rate of 0.8 - 1.2 MB/sec. > > The kernel shows traces like > > INFO: task syslogd:1081 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syslogd D ffff880001003460 0 1081 1 1084 1073 > (NOTLB) > ffff8800367edd88 0000000000000286 ffff8800367edd98 ffffffff80262dd3 > 0000000000000009 ffff88003fb007a0 ffffffff804f4b80 0000000000000d5b > ffff88003fb00988 0000000000006d06 > Call Trace: > [<ffffffff80262dd3>] thread_return+0x6c/0x113 > [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 > [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e > [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff > [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328 > [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291 > [<ffffffff802e555b>] sync_inode+0x24/0x33 > [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc > [<ffffffff80252276>] do_fsync+0x52/0xa4 > [<ffffffff802d37f5>] __do_fsync+0x23/0x36 > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > > Iscsiadm shows no errors. > > # iscsiadm -m session -r 1 -s > Stats for session [sid: 1, target: > iqn.2006-01.com.openfiler:tsn.26336ef50fe0:storage1_osimages, portal: > 172.16.0.2,3260] > iSCSI SNMP: > txdata_octets: 486181549212 > rxdata_octets: 2622687792 > noptx_pdus: 0 > scsicmd_pdus: 15184105 > tmfcmd_pdus: 0 > login_pdus: 0 > text_pdus: 0 > dataout_pdus: 195910 > logout_pdus: 0 > snack_pdus: 0 > noprx_pdus: 0 > scsirsp_pdus: 15184088 > tmfrsp_pdus: 0 > textrsp_pdus: 0 > datain_pdus: 87898 > logoutrsp_pdus: 0 > r2t_pdus: 151200 > async_pdus: 0 > rjt_pdus: 0 > digest_err: 0 > timeout_err: 0 > iSCSI Extended: > tx_sendpage_failures: 0 > rx_discontiguous_hdr: 0 > eh_abort_cnt: 0 > > If I reboot the domU after giving back her life, in most cases, the ext3 > journal is corrupt, and the kernel panics after one reboot more. > > If I try to install a PV-Domain (CentOS-5.5) the installer asks if I wish to > initialize the disk xvda, but if the disk partitioning and layout questions > appear the disk is missing in the list. There's nothing more than a question > mark. > Sometimes I have the disk in the list, if so I can install the OS, all seems > fine, but after the second reboot the ext3 journal is missing and the kernel > panics after the third reboot, rootfs is gone. > > > Are there any ideas? I'm out of. > > Thanks > Christian > > Some kernel logging from domU, nothing inside dom0 log. > > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743295 > Aborting journal on device dm-0. > ext3_abort called. > EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal > Remounting filesystem read-only > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743296 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743297 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743298 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743299 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743300 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743301 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743302 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743303 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743304 > EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for > block 743305 > EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted > EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has aborted > EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted > __journal_remove_journal_head: freeing b_committed_data > __journal_remove_journal_head: freeing b_committed_data > __journal_remove_journal_head: freeing b_committed_data > > > > _______________________________________________ > Xen-users mailing list > Xen-users@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |