Yeah, HT is off (I don't even know if you can turn it on in the PE1950s!). I'm getting some interesting stuff from tcpdump:
09:04:14.639345 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42499521 win 5080 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639345 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42500969 win 5804 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639373 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42502417 win 6528 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639373 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42503865 win 7252 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639374 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42505313 win 7976 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639374 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42506761 win 8700 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639375 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42508209 win 9424 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639396 IP swbox1.seakr.com.785 > alpha_0.seakr.com.nfs: . ack 42509657 win 10148 <nop,nop,timestamp 5108364 2431452318>
09:04:14.639647 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194578178: reply ERR 1448
09:04:14.639657 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.1879243268: reply ERR 1448
09:04:14.639661 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194609922: reply ERR 1448
09:04:14.639665 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4009949700: reply ERR 1448
09:04:14.639670 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194630148: reply ERR 1448
09:04:14.639674 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.2533620228: reply ERR 1448
09:04:14.639720 IP alpha_0.seakr.com.nfs > swbox1.seakr.com.4194630148: reply ERR 1448
I've briefly looked at some Google results for "reply ERR 1448" but haven't come up with anything real concrete. I'm going to keep looking at that one to see if that may lead somewhere. In the meantime, I've disabled tx checksums in domU and am running a couple more tests to see if I can reproduce the long I/O waits at all. I'll let you know how that turns out. I also get some "reply ERR 1084" messages sprinkled in there, too.
I'll also try out some of the NFS settings to see if anything there helps and let you know.
Thanks for the help - much appreciated!
--Nick
>>> On Tue, Oct 23, 2007 at 9:17 PM, "Steve Senator (Senator Ent)" <sts@xxxxxxxxxxx> wrote:
Xen can exacerbate Linux SMP issues. Do you have hyperthreading turned on in your CPU's? If so, at least for testing, try turning it off.
Also, beyond turning of the TX offloading in both the dom0 and domU, is there any chance that there's another device attached to that bridge which would cause network delays? In particular, is there a device that may incorrectly see the domU IP as coming from the dom0 due to an ARP conflict? I see that you've specified a fixed MAC address. Is there any chance that that same MAC address is used by the dom0? Perhaps the initrd is the one from dom0 and its got the MAC address set in the initrd to be the same as the one in the dom0?
Try tcpdumping from both domains and see if you see any retransmissions, or perhaps even a smoking gun like a system ARPing for itself when it should know better.
It's also possible that there's a transmission size problem. There have been reported problems of dom0<->domU traffic not honoring the MTU of the bridge or virtual device, which then forces retransmission when the receiving side cannot handle the larger buffer.
If NFS, try changing from TCP to UDP or modifying the rsize and wsize buffering to fit within the MTU of your (virtual) ethernet devices.
Hope this helps, -Steve Senator
Quoting Nick Couchman <Nick.Couchman@xxxxxxxxx>:
> Hi, again...haven't had any responses to this, yet.
>>>> Nick Couchman 10/18/07 11:05 AM >>> > Hey, everyone, > I'm having some issues with a Xen DomU right related to performance. > ... The culprit seems to be high I/O wait times related to the > network interface. > > The host machine is a Dell PowerEdge 1950 with 2 x Dual-Core Xeon > processors (Xeon 5150 @ 2.66GHz, 1333 FSB). ... Building these > Linux distributions on the physical system takes 70-80 minutes > (real time) - on the DomU system it takes 130-140 minutes. > ... > vif=[ 'mac=00:16:3e:75:0d:be,bridge=xenbr108', ]
|