Xen project Mailing List

Re: [Xen-devel] tx offload issue w/stubdoms + igb

To: John Weekes <lists.xen@xxxxxxxxxxxxxxxxxx>

From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

Date: Tue, 14 Dec 2010 14:00:47 -0800

Cc: Samuel Thibault <samuel.thibault@xxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Delivery-date: Tue, 14 Dec 2010 14:04:11 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

(Added Stefano.) On 12/14/2010 01:59 PM, Jeremy Fitzhardinge wrote: > On 12/14/2010 02:12 AM, John Weekes wrote: >> I tested further and found that: >> >> * dom0 does't have the issue, normal PV domains do not have the issue, >> and Windows GPLPV-based domains do not have the issue. It seems to be >> specific to stubdom-based domains. > That's interesting. There were a number of fixes to netfront/back to > make sure all this checksum offload stuff worked properly, and I was > never convinced they were also ported to stubdom's netfront. I don't > remember the specifics now, unfortunately. > > J > >> * Other machines running the exact same Xen release and kernel >> version, but that use the e1000 driver instead of the igb driver, >> don't seem to have the problem. I don't know if it's related (I have >> not yet been able to test with MSI disabled), but those machines >> without the problem also aren't using MSI-X, whereas the igb-based >> machine that shows the problem is. From dmesg: >> >> [ 21.209923] Intel(R) Gigabit Ethernet Network Driver - version >> 1.3.16-k2 >> [ 21.210026] Copyright (c) 2007-2009 Intel Corporation. >> [ 21.210140] xen: registering gsi 28 triggering 0 polarity 1 >> [ 21.210145] xen: --> irq=28 >> [ 21.210151] igb 0000:01:00.0: PCI INT A -> GSI 28 (level, low) -> >> IRQ 28 >> [ 21.210279] igb 0000:01:00.0: setting latency timer to 64 >> [ 21.382336] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network >> Connection >> [ 21.382435] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) >> 00:25:90:09:e4:00 >> [ 21.382605] igb 0000:01:00.0: eth0: PBA No: ffffff-0ff >> [ 21.382698] igb 0000:01:00.0: Using MSI-X interrupts. 4 rx >> queue(s), 4 tx queue(s) >> >> (Both the e1000 and igb machines have the hvm_directio flag in the "xl >> info" output.) >> >> * Different GSO/TSO settings do not appear to make a difference. Only >> the tx offload setting does. >> >> * Inside the problematic domU, the bad segment counter increments when >> the issue is occurring: >> >> testvds5 ~ # netstat -s eth0 >> Ip: >> 22162 total packets received >> 44 with invalid addresses >> 0 forwarded >> 0 incoming packets discarded >> 22113 incoming packets delivered >> 19582 requests sent out >> Icmp: >> 2694 ICMP messages received >> 0 input ICMP message failed. >> ICMP input histogram: >> timeout in transit: 2447 >> echo replies: 247 >> 2698 ICMP messages sent >> 0 ICMP messages failed >> ICMP output histogram: >> destination unreachable: 2 >> IcmpMsg: >> InType0: 247 >> InType11: 2447 >> OutType3: 2 >> OutType69: 2696 >> Tcp: >> 4 active connections openings >> 3 passive connection openings >> 0 failed connection attempts >> 0 connection resets received >> 3 connections established >> 18819 segments received >> 16795 segments send out >> 0 segments retransmited >> 2366 bad segments received. >> 8 resets sent >> Udp: >> 65 packets received >> 2 packets to unknown port received. >> 0 packet receive errors >> 89 packets sent >> UdpLite: >> TcpExt: >> 1 TCP sockets finished time wait in fast timer >> 172 delayed acks sent >> Quick ack mode was activated 89 times >> 3 packets directly queued to recvmsg prequeue. >> 33304 bytes directly in process context from backlog >> 3 bytes directly received in process context from prequeue >> 7236 packet headers predicted >> 23 packets header predicted and directly queued to user >> 3117 acknowledgments not containing data payload received >> 89 DSACKs sent for old packets >> 2 DSACKs sent for out of order packets >> 2 connections reset due to unexpected data >> IpExt: >> InBcastPkts: 533 >> InOctets: 23420805 >> OutOctets: 1601733 >> InBcastOctets: 162268 >> testvds5 ~ # >> >> * Some sites transfer quickly to the domU quickly regardless of the tx >> offload setting, exhibiting the symptoms less. For instance, uiuc.edu >> with tx on: >> >> root@testvds5:~# wget >> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso >> --2010-12-14 03:53:50-- >> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso >> Resolving gentoo.cites.uiuc.edu... 128.174.5.78 >> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected. >> HTTP request sent, awaiting response... 200 OK >> Length: 2798649344 (2.6G) [text/plain] >> Saving to: `livedvd-amd64-multilib-10.1.iso' >> >> 0% [ ] 25,754,272 3.06M/s eta >> 17m 7s ^C >> root@testvds5:~# >> >> (netstat shows 23 bad segments received over the length of that test) >> >> and with tx off: >> >> root@testvds5:~# wget >> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso >> --2010-12-14 03:54:45-- >> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso >> Resolving gentoo.cites.uiuc.edu... 128.174.5.78 >> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected. >> HTTP request sent, awaiting response... 200 OK >> Length: 2798649344 (2.6G) [text/plain] >> Saving to: `livedvd-amd64-multilib-10.1.iso.1' >> >> 1% [ ] 47,677,960 3.95M/s eta >> 12m 0s ^C >> >> * The issue also occurs in xen-4.0-testing, as of c/s 21392. >> >> For reference, Xen and kernel version output: >> >> nyc-dodec266 src # xl info >> host : nyc-dodec266 >> release : 2.6.32.26-g862ef97 >> version : #4 SMP Wed Dec 8 16:38:18 EST 2010 >> machine : x86_64 >> nr_cpus : 24 >> nr_nodes : 2 >> cores_per_socket : 12 >> threads_per_core : 1 >> cpu_mhz : 2674 >> hw_caps : >> bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000 >> virt_caps : hvm hvm_directio >> total_memory : 49143 >> free_memory : 9178 >> free_cpus : 0 >> xen_major : 4 >> xen_minor : 1 >> xen_extra : -unstable >> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 >> hvm-3.0-x86_32p hvm-3.0-x86_64 >> xen_scheduler : credit >> xen_pagesize : 4096 >> platform_params : virt_start=0xffff800000000000 >> xen_changeset : Wed Dec 08 10:46:31 2010 +0000 >> 22467:89116f28083f >> xen_commandline : dom0_mem=2550M dom0_max_vcpus=4 >> cc_compiler : gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.2, >> pie-0.4.5) >> cc_compile_by : root >> cc_compile_domain : nuclearfallout.net >> cc_compile_date : Fri Dec 10 00:51:50 EST 2010 >> xend_config_format : 4 >> nyc-dodec266 src # uname -a >> Linux nyc-dodec266 2.6.32.26-g862ef97 #4 SMP Wed Dec 8 16:38:18 EST >> 2010 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux >> >> For now, I can use the "tx off" workaround by having a script set it >> for all newly created domains. Is anyone up for nailing this down and >> finding a real fix? Failing that, applying the workaround in the Xen >> tools might be a good idea. >> >> -John >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.