[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] network hang again
I tracked the glitch back to the 2.4.27 domain-1 (unpriv, uses evms blocks from dom0 to serve out as iscsi targets via file-io) with this error message being the trigger point of the colapse. Sep 15 00:16:55 localhost kernel: fileio_make_request(85) Bad things happened 40 96, -5 from kernel/file-io.c:lines 76 to 85 seems to be the error point. if (rw == READ) ret = generic_file_read(filp, buf, count, &ppos); else ret = generic_file_write(filp, buf, count, &ppos); if (ret != count) printk("%s(%d) Bad things happened %lld, %d\n", __FUNCTION__, __LINE__, count, ret); -5 is -EIO in linux-2.4.27/include/asm-i386/errno.h:8 #define EIO 5 /* I/O error */ I do NOT get any errors from domain0, so I can't trace through to dom0 right now. 8-( This error coincides perfectly time wise with the linux-iscsi initiator errors I got earlier this week, so I believe that this is what's triggering the iscsi-initiator error. Any advice on how to figure out what is causing the I/O error would be greatly appreciated. Right now it is the ONLY thing that is holding me back from using the IET iSCSI target. Thanks! Brian Wolfe On Tue, 2004-09-14 at 21:50, James Harper wrote: > When I explained about the patch on the iet list, I was asked if I was > getting frequent disconnections :) > > It sounds like the network issues I'm seeing in xen are probably > triggering the crash in iscsi. > > I'm running iet 0.3.3 + 2.6 patch + my additional 2.6 patch on dom0, and > linux-iscsi 4.0.1.8 on dom1. > > James > > > -----Original Message----- > > From: Brian Wolfe [mailto:ahzz@xxxxxxxxxxx] > > Sent: Wednesday, 15 September 2004 02:22 > > To: James Harper > > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [Xen-devel] network hang again > > > > I have been running IET 0.3.3 on 2.4.27 on one machine, and cisco's > > linux-iscsi on 2.6.8.1 on a second physical machine for a couple days > > now. So far the only thing that I have run into is a dump message > > concerning OOM on the linux-iscsi machine. > > > > > > Sep 13 00:20:11 vhost1 kernel: iSCSI: 4.0.1 ( 9-Feb-2004) built for > > Linux 2.6.8-tbc-vhost-Xen0 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: will translate deferred sense to > > current sense on disk command responses > > Sep 13 00:20:11 vhost1 kernel: iSCSI: control device major number 254 > > Sep 13 00:20:11 vhost1 kernel: scsi_proc_hostdir_add: proc_mkdir > failed > > for <NULL> > > Sep 13 00:20:11 vhost1 kernel: scsi17 : Cisco iSCSI driver > > Sep 13 00:20:11 vhost1 kernel: iSCSI:detected HBA host #17 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 = > > iqn.2001-04.dmz.iscsi1:wnhttp > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 portal 0 = > address > > 10.11.7.1 port 3260 group 1 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: starting timer thread at > 21835751 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 trying to > establish > > session to portal 0, address 10.11.7.1 port 32 > > 60 group 1 > > Sep 13 00:20:12 vhost1 kernel: iSCSI: session c1478000 authenticated > by > > target iqn.2001-04.dmz.iscsi1:wnhttp > > Sep 13 00:20:12 vhost1 kernel: iSCSI: bus 0 target 0 established > session > > #1, portal 0, address 10.11.7.1 port 3260 grou > > p 1 > > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > > ISCSI Rev: 0 > > Sep 13 00:20:12 vhost1 kernel: Type: > > Direct-Access ANSI SCSI revision: 03 > > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: 16777212 512-byte hdwr > > sectors (8590 MB) > > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: drive cache: write > back > > Sep 13 00:20:12 vhost1 kernel: sda: unknown partition table > > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sda at scsi17, > channel > > 0, id 0, lun 0 > > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > > ISCSI Rev: 0 > > Sep 13 00:20:12 vhost1 kernel: Type: > > Direct-Access ANSI SCSI revision: 03 > > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: 65536 512-byte hdwr > > sectors (34 MB) > > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: drive cache: write > back > > Sep 13 00:20:12 vhost1 kernel: sdb: unknown partition table > > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sdb at scsi17, > channel > > 0, id 0, lun 1 > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: found reiserfs format > > "3.6" with standard journal > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: using ordered data mode > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: journal params: device > > sda, size 8192, journal first block 18, max trans > > len 1024, max batch 900, max commit age 30, max trans age 30 > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: checking transaction log > > (sda) > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: replayed 1 transactions > in > > 0 seconds > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: Using r5 hash to sort > > names > > Sep 13 00:28:51 vhost1 kernel: iscsi-tx: page allocation failure. > > order:1, mode:0x20 > > Sep 13 00:28:51 vhost1 kernel: [__alloc_pages+728/848] > > __alloc_pages+0x2d8/0x350 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__get_free_pages+31/64] > > __get_free_pages+0x1f/0x40 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [kmem_getpages+30/224] > > kmem_getpages+0x1e/0xe0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [cache_grow+159/336] > > cache_grow+0x9f/0x150 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [cache_alloc_refill+318/512] > > cache_alloc_refill+0x13e/0x200 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__kmalloc+139/160] > __kmalloc+0x8b/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [alloc_skb+71/224] alloc_skb+0x47/0xe0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+38296326/1002676224] > > rhine_rx+0x156/0x460 [via_rhine] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+38295340/1002676224] > > rhine_interrupt+0x1ac/0x1d0 [via_rhine] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [handle_IRQ_event+73/144] > > handle_IRQ_event+0x49/0x90 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+109/240] do_IRQ+0x6d/0xf0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [evtchn_do_upcall+156/256] > > evtchn_do_upcall+0x9c/0x100 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [hypervisor_callback+51/73] > > hypervisor_callback+0x33/0x49 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [csum_partial_copy_generic+63/248] > > csum_partial_copy_generic+0x3f/0xf8 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [tcp_sendmsg+578/4176] > > tcp_sendmsg+0x242/0x1050 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [inet_sendmsg+77/96] > > inet_sendmsg+0x4d/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [sock_sendmsg+165/192] > > sock_sendmsg+0xa5/0xc0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > > __do_softirq+0x95/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_softirq+69/80] > do_softirq+0x45/0x50 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+194/240] do_IRQ+0xc2/0xf0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39270168/1002676224] > > iscsi_xmit_queued_cmnds+0x188/0x3c0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39254271/1002676224] > > iscsi_sendmsg+0x4f/0x70 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39271874/1002676224] > > iscsi_xmit_data+0x472/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > > __do_softirq+0x95/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39273273/1002676224] > > iscsi_xmit_r2t_data+0x119/0x1f0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39165617/1002676224] > > iscsi_tx_thread+0x711/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > > autoremove_wake_function+0x0/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > > autoremove_wake_function+0x0/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [default_wake_function+0/32] > > default_wake_function+0x0/0x20 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39163808/1002676224] > > iscsi_tx_thread+0x0/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [kernel_thread_helper+5/16] > > kernel_thread_helper+0x5/0x10 > > Sep 13 00:28:51 vhost1 kernel: > > > > The only reason I'm posting the "trace" from linux-iscsi is because it > > contains the hypervisor_callback function in it and it's in the rx > phase > > of the via_rhine driver. > > > > What iscsi are you running on each machine? (Sorry if I missed it, > been > > offline for a few deays now. 8-( ) I'd be interested to know if this > is > > in any way similar to your issue. > > > > Brian > > > > > > On Tue, 2004-09-14 at 07:38, James Harper wrote: > > > I'm now seeing this network hang a lot, to the point where it makes > my > > > iscsi testing unusable. I believe this is more to do with the sort > of > > > testing I'm doing now more so than a bug that has suddenly appeared. > > > > > > My setup is this: > > > Dom0: > > > 2.6.8.1 > > > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. > > > No conntrack or other netfilter related modules > > > Bridged eth0 to Dom1 > > > /usr/src exported via nfs > > > > > > Dom1: > > > 2.6.8.1 > > > Linux-iscsi 4.0.1.8 > > > No conntrack or other netfilter related modules > > > /usr/src mounted from Dom0 > > > > > > Iscsi works for a while, normally crashing in Dom0 due to another > > > non-xen related bug before it hits this bug, but if I try to do a > > > compile on Dom1 in the nfs mounted /usr/src, the network locks up > almost > > > instantly, but then clears up shortly after if I kill the compile. > > > > > > The logs show absolutely nothing of any use. > > > > > > I've just tried a few netperf tests. A quick hammering goes off > without > > > a hitch, but afterwards I see random dropped packets. I'll keep > testing. > > > > > > James > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > > > Project Admins to receive an Apple iPod Mini FREE for your judgement > on > > > who ports your project to Linux PPC the best. Sponsored by IBM. > > > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@xxxxxxxxxxxxxxxxxxxxx > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: thawte's Crypto Challenge Vl > Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam > Camcorder. More prizes in the weekly Lunch Hour Challenge. > Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.sourceforge.net/lists/listinfo/xen-devel ------------------------------------------------------- This SF.Net email is sponsored by: thawte's Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |