[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] tx offload issue w/stubdoms + igb



On 12/14/2010 02:12 AM, John Weekes wrote:
> I tested further and found that:
>
> * dom0 does't have the issue, normal PV domains do not have the issue,
> and Windows GPLPV-based domains do not have the issue. It seems to be
> specific to stubdom-based domains.

That's interesting.  There were a number of fixes to netfront/back to
make sure all this checksum offload stuff worked properly, and I was
never convinced they were also ported to stubdom's netfront.  I don't
remember the specifics now, unfortunately.

    J

>
> * Other machines running the exact same Xen release and kernel
> version, but that use the e1000 driver instead of the igb driver,
> don't seem to have the problem. I don't know if it's related (I have
> not yet been able to test with MSI disabled), but those machines
> without the problem also aren't using MSI-X, whereas the igb-based
> machine that shows the problem is. From dmesg:
>
> [   21.209923] Intel(R) Gigabit Ethernet Network Driver - version
> 1.3.16-k2
> [   21.210026] Copyright (c) 2007-2009 Intel Corporation.
> [   21.210140] xen: registering gsi 28 triggering 0 polarity 1
> [   21.210145] xen: --> irq=28
> [   21.210151] igb 0000:01:00.0: PCI INT A -> GSI 28 (level, low) ->
> IRQ 28
> [   21.210279] igb 0000:01:00.0: setting latency timer to 64
> [   21.382336] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network
> Connection
> [   21.382435] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4)
> 00:25:90:09:e4:00
> [   21.382605] igb 0000:01:00.0: eth0: PBA No: ffffff-0ff
> [   21.382698] igb 0000:01:00.0: Using MSI-X interrupts. 4 rx
> queue(s), 4 tx queue(s)
>
> (Both the e1000 and igb machines have the hvm_directio flag in the "xl
> info" output.)
>
> * Different GSO/TSO settings do not appear to make a difference. Only
> the tx offload setting does.
>
> * Inside the problematic domU, the bad segment counter increments when
> the issue is occurring:
>
> testvds5 ~ # netstat -s eth0
> Ip:
>     22162 total packets received
>     44 with invalid addresses
>     0 forwarded
>     0 incoming packets discarded
>     22113 incoming packets delivered
>     19582 requests sent out
> Icmp:
>     2694 ICMP messages received
>     0 input ICMP message failed.
>     ICMP input histogram:
>         timeout in transit: 2447
>         echo replies: 247
>     2698 ICMP messages sent
>     0 ICMP messages failed
>     ICMP output histogram:
>         destination unreachable: 2
> IcmpMsg:
>         InType0: 247
>         InType11: 2447
>         OutType3: 2
>         OutType69: 2696
> Tcp:
>     4 active connections openings
>     3 passive connection openings
>     0 failed connection attempts
>     0 connection resets received
>     3 connections established
>     18819 segments received
>     16795 segments send out
>     0 segments retransmited
>     2366 bad segments received.
>     8 resets sent
> Udp:
>     65 packets received
>     2 packets to unknown port received.
>     0 packet receive errors
>     89 packets sent
> UdpLite:
> TcpExt:
>     1 TCP sockets finished time wait in fast timer
>     172 delayed acks sent
>     Quick ack mode was activated 89 times
>     3 packets directly queued to recvmsg prequeue.
>     33304 bytes directly in process context from backlog
>     3 bytes directly received in process context from prequeue
>     7236 packet headers predicted
>     23 packets header predicted and directly queued to user
>     3117 acknowledgments not containing data payload received
>     89 DSACKs sent for old packets
>     2 DSACKs sent for out of order packets
>     2 connections reset due to unexpected data
> IpExt:
>     InBcastPkts: 533
>     InOctets: 23420805
>     OutOctets: 1601733
>     InBcastOctets: 162268
> testvds5 ~ #
>
> * Some sites transfer quickly to the domU quickly regardless of the tx
> offload setting, exhibiting the symptoms less. For instance, uiuc.edu
> with tx on:
>
> root@testvds5:~# wget
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> --2010-12-14 03:53:50-- 
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 2798649344 (2.6G) [text/plain]
> Saving to: `livedvd-amd64-multilib-10.1.iso'
>
>  0% [                                       ] 25,754,272  3.06M/s  eta
> 17m 7s  ^C
> root@testvds5:~#
>
> (netstat shows 23 bad segments received over the length of that test)
>
> and with tx off:
>
> root@testvds5:~# wget
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> --2010-12-14 03:54:45-- 
> http://gentoo.cites.uiuc.edu/pub/gentoo/releases/amd64/10.1/livedvd-amd64-multilib-10.1.iso
> Resolving gentoo.cites.uiuc.edu... 128.174.5.78
> Connecting to gentoo.cites.uiuc.edu|128.174.5.78|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 2798649344 (2.6G) [text/plain]
> Saving to: `livedvd-amd64-multilib-10.1.iso.1'
>
>  1% [                                       ] 47,677,960  3.95M/s  eta
> 12m 0s  ^C
>
> * The issue also occurs in xen-4.0-testing, as of c/s 21392.
>
> For reference, Xen and kernel version output:
>
> nyc-dodec266 src # xl info
> host                   : nyc-dodec266
> release                : 2.6.32.26-g862ef97
> version                : #4 SMP Wed Dec 8 16:38:18 EST 2010
> machine                : x86_64
> nr_cpus                : 24
> nr_nodes               : 2
> cores_per_socket       : 12
> threads_per_core       : 1
> cpu_mhz                : 2674
> hw_caps                :
> bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000
> virt_caps              : hvm hvm_directio
> total_memory           : 49143
> free_memory            : 9178
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 1
> xen_extra              : -unstable
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : Wed Dec 08 10:46:31 2010 +0000
> 22467:89116f28083f
> xen_commandline        : dom0_mem=2550M dom0_max_vcpus=4
> cc_compiler            : gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.2,
> pie-0.4.5)
> cc_compile_by          : root
> cc_compile_domain      : nuclearfallout.net
> cc_compile_date        : Fri Dec 10 00:51:50 EST 2010
> xend_config_format     : 4
> nyc-dodec266 src # uname -a
> Linux nyc-dodec266 2.6.32.26-g862ef97 #4 SMP Wed Dec 8 16:38:18 EST
> 2010 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux
>
> For now, I can use the "tx off" workaround by having a script set it
> for all newly created domains. Is anyone up for nailing this down and
> finding a real fix? Failing that, applying the workaround in the Xen
> tools might be a good idea.
>
> -John
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.