[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] tx offload issue w/stubdoms + igb
On 12/14/2010 02:12 AM, John Weekes wrote: > I tested further and found that: > > * dom0 does't have the issue, normal PV domains do not have the issue, > and Windows GPLPV-based domains do not have the issue. It seems to be > specific to stubdom-based domains. That's interesting. There were a number of fixes to netfront/back to make sure all this checksum offload stuff worked properly, and I was never convinced they were also ported to stubdom's netfront. I don't remember the specifics now, unfortunately. J > > * Other machines running the exact same Xen release and kernel > version, but that use the e1000 driver instead of the igb driver, > don't seem to have the problem. I don't know if it's related (I have > not yet been able to test with MSI disabled), but those machines > without the problem also aren't using MSI-X, whereas the igb-based > machine that shows the problem is. From dmesg: > > [ 21.209923] Intel(R) Gigabit Ethernet Network Driver - version > 1.3.16-k2 > [ 21.210026] Copyright (c) 2007-2009 Intel Corporation. > [ 21.210140] xen: registering gsi 28 triggering 0 polarity 1 > [ 21.210145] xen: --> irq=28 > [ 21.210151] igb 0000:01:00.0: PCI INT A -> GSI 28 (level, low) -> > IRQ 28 > [ 21.210279] igb 0000:01:00.0: setting latency timer to 64 > [ 21.382336] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network > Connection > [ 21.382435] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) > 00:25:90:09:e4:00 > [ 21.382605] igb 0000:01:00.0: eth0: PBA No: ffffff-0ff > [ 21.382698] igb 0000:01:00.0: Using MSI-X interrupts. 4 rx > queue(s), 4 tx queue(s) > > (Both the e1000 and igb machines have the hvm_directio flag in the "xl > info" output.) > > * Different GSO/TSO settings do not appear to make a difference. Only > the tx offload setting does. > > * Inside the problematic domU, the bad segment counter increments when > the issue is occurring: > > testvds5 ~ # netstat -s eth0 > Ip: > 22162 total packets received > 44 with invalid addresses > 0 forwarded > 0 incoming packets discarded > 22113 incoming packets delivered > 19582 requests sent out > Icmp: > 2694 ICMP messages received > 0 input ICMP message failed. > ICMP input histogram: > timeout in transit: 2447 > echo replies: 247 > 2698 ICMP messages sent > 0 ICMP messages failed > ICMP output histogram: > destination unreachable: 2 > IcmpMsg: > InType0: 247 > InType11: 2447 > OutType3: 2 > OutType69: 2696 > Tcp: > 4 active connections openings > 3 passive connection openings > 0 failed connection attempts > 0 connection resets received > 3 connections established > 18819 segments received > 16795 segments send out > 0 segments retransmited > 2366 bad segments received. > 8 resets sent > Udp: > 65 packets received > 2 packets to unknown port received. > 0 packet receive errors > 89 packets sent > UdpLite: > TcpExt: > 1 TCP sockets finished time wait in fast timer > 172 delayed acks sent > Quick ack mode was activated 89 times > 3 packets directly queued to recvmsg prequeue. > 33304 bytes directly in process context from backlog > 3 bytes directly received in process context from prequeue > 7236 packet headers predicted > 23 packets header predicted and directly queued to user > 3117 acknowledgments not containing data payload received > 89 DSACKs sent for old packets > 2 DSACKs sent for out of order packets > 2 connections reset due to unexpected data > IpExt: > InBcastPkts: 533 > InOctets: 23420805 > OutOctets: 1601733 > InBcastOctets: 162268 > testvds5 ~ # > > * Some sites transfer quickly to the domU quickly regardless of the tx > offload setting, exhibiting the symptoms less. For instance, uiuc.edu > with tx on: > > root@testvds5:~# wget > http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso > --2010-12-14 03:53:50-- > http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso > Resolving gentoo.cites.uiuc.edu... 128.174.5.78 > Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected. > HTTP request sent, awaiting response... 200 OK > Length: 2798649344 (2.6G) [text/plain] > Saving to: `livedvd-amd64-multilib-10.1.iso' > > 0% [ ] 25,754,272 3.06M/s eta > 17m 7s ^C > root@testvds5:~# > > (netstat shows 23 bad segments received over the length of that test) > > and with tx off: > > root@testvds5:~# wget > http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso > --2010-12-14 03:54:45-- > http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso > Resolving gentoo.cites.uiuc.edu... 128.174.5.78 > Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected. > HTTP request sent, awaiting response... 200 OK > Length: 2798649344 (2.6G) [text/plain] > Saving to: `livedvd-amd64-multilib-10.1.iso.1' > > 1% [ ] 47,677,960 3.95M/s eta > 12m 0s ^C > > * The issue also occurs in xen-4.0-testing, as of c/s 21392. > > For reference, Xen and kernel version output: > > nyc-dodec266 src # xl info > host : nyc-dodec266 > release : 2.6.32.26-g862ef97 > version : #4 SMP Wed Dec 8 16:38:18 EST 2010 > machine : x86_64 > nr_cpus : 24 > nr_nodes : 2 > cores_per_socket : 12 > threads_per_core : 1 > cpu_mhz : 2674 > hw_caps : > bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000 > virt_caps : hvm hvm_directio > total_memory : 49143 > free_memory : 9178 > free_cpus : 0 > xen_major : 4 > xen_minor : 1 > xen_extra : -unstable > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : Wed Dec 08 10:46:31 2010 +0000 > 22467:89116f28083f > xen_commandline : dom0_mem=2550M dom0_max_vcpus=4 > cc_compiler : gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.2, > pie-0.4.5) > cc_compile_by : root > cc_compile_domain : nuclearfallout.net > cc_compile_date : Fri Dec 10 00:51:50 EST 2010 > xend_config_format : 4 > nyc-dodec266 src # uname -a > Linux nyc-dodec266 2.6.32.26-g862ef97 #4 SMP Wed Dec 8 16:38:18 EST > 2010 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux > > For now, I can use the "tx off" workaround by having a script set it > for all newly created domains. Is anyone up for nailing this down and > finding a real fix? Failing that, applying the workaround in the Xen > tools might be a good idea. > > -John > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |