[Xen-devel] Re: Very slow domU network performance - Moved to xe

On Apr 5, 2006, at 1:11 PM, Matt Ayres wrote:

Winston Chang wrote:
I ran the test with the latest xen-unstable build. The resultsare the same.When I ran 'xm sched-sedf 0 0 0 0 1 1' to prevent domU CPUstarvation, network performance was good. The numbers in thiscase are the same as in my other message where I detail theresults using the week-old xen build -- it could handle 90Mb/swith no datagram loss. So it looks like the checksum patcheshad no effect on this phenomenon; the only thing that matteredwas the scheduling.
What was the previous weight of domain 0? What is the weightassigned to the domU's and do the domU's have bursting enabled?
I'm not really sure the answer to either of these questions. Theweight is whatever is the default is with Fedora Core 5 and xen-unstable. I don't know anything about bursting. How do you find out?
I'd like to be corrected if I am wrong, but the last number(weight) is set to 0 for all domains by default. By giving it avalue of 1 you are giving dom0 more CPU. The second to last numberis a boolean that decides whether a domain is hard locked to it'sweight or if can burst using idle CPU cycles. The 3 before thatare generally set to 0 and the first number is the domain name. Ido not know of a way to grab the weights personally. It isdocumented in the Xen distribution tgz.

I can tell you the symptoms I had: whenever a process in dom0 grabs100% of the CPU, the domU console freezes. After a little while, thedomU console says "BUG: soft lockup detected on CPU#0!" So I believethat with my default settings, dom0 always gets first priority, anddomU gets the leftovers.

For those that have just seen this (this thread started on xen-users): I had very poor UDP performance using iperf with domU as theserver and dom0 as the client. I had 99.98% packet loss when runningat 90Mb/s in this case, until I changed the scheduling, as above.Then packet loss dropped to 0. In the reverse direction there wasnever a problem.


For more details, see the original thread here:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00096.html

It's possible that iperf is partially at fault here. (I used version1.7.0 since 2.0.2 wouldn't compile on my iBook.) I noticed that ittakes 100% of CPU time when it's used as a UDP client, even whenrunning at lower speeds -- I saw this at 4Mb/s. I would wager thatit uses a while loop to delay between sending datagrams. Since iperfalways wants all the CPU cycles and because domU has last priority inmy default scheduling config, domU just wouldn't get enough CPU timeto process the incoming datagrams.


A more general note about using iperf:

It seems to me that as long as iperf uses 100% of the CPU, it is nota good tool for testing dom0-domU or domU-domU network performance.This sort of timing loop would be fine for network tests using "real"hosts, but not ones in which CPU resources are shared and network I/Ois CPU-bound, as is the case here.

I would guess that this would not occur on SMP machines (and maybehyperthreaded ones also), since iperf's timing loop wouly use up onlyone CPU.

The other network issue I had was very slow TCP performance when domUwas the iperf server and an external machine was the iperf client. Ihad 2 Mb/s in this case, but about 90Mb/s in the other direction (on100Mbit ethernet). This problem disappeared when I did thescheduling change above.

This issue is _not_ explained by the iperf hogging the CPU as Imentioned above. No user-level process in dom0 should be involved;dom0 just does some low-level networking. But if the cause of thisTCP problem is that dom0 is taking all the CPU resources, then thatwould suggest that somewhere in the xen networking/bridging code, itis getting 100% CPU time, just to do bridging for the incoming data.Does this indicate a problem in the networking code?

Again, the TCP slowness does not occur in the reverse direction, whendomU is sending to an external machine. My guess is that, like theiperf UDP issue above, that this problem would not occur on SMPmachines.



--Winston

I ran my own tests. I have dom0 with a weight of 512 (double it'smemory allocation) and each VM also has a weight equal to it'smemory allocation. My dom0 can transfer at 10MB/s+ over the LAN,but domU's with 100% CPU used on the host could only transfer overthe LAN at a peak of 800KB/s. When I gave dom0 a weight of 1 domUtransfers decreased to a peak of 100KB/s over the "LAN" (quotedbecause due to proxy ARP the host acts as a router)
The problem does not matter if you use bridged or routed mode.
I would have to believe the problem is in the hypervisor itself andscheduling and CPU usage greatly affect it. Network bandwidthshould not be affected unless wanted (ie. by using the rate vifparameter).
Stephen Soltesz has experienced the same problem and has somegraphs to back it up. Stephen, will you share at least that oneCPU + IPerf graph with the community and perhaps elaborate on yourweight configuration (if any).
Thank you,
Matt Ayres



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: Very slow domU network performance - Moved to xen-devel