WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: Very slow domU network performance - Moved to xen-devel

On Apr 5, 2006, at 1:11 PM, Matt Ayres wrote:
Winston Chang wrote:
I ran the test with the latest xen-unstable build. The results are the same. When I ran 'xm sched-sedf 0 0 0 0 1 1' to prevent domU CPU starvation, network performance was good. The numbers in this case are the same as in my other message where I detail the results using the week-old xen build -- it could handle 90Mb/s with no datagram loss. So it looks like the checksum patches had no effect on this phenomenon; the only thing that mattered was the scheduling.

What was the previous weight of domain 0? What is the weight assigned to the domU's and do the domU's have bursting enabled?
I'm not really sure the answer to either of these questions. The weight is whatever is the default is with Fedora Core 5 and xen- unstable. I don't know anything about bursting. How do you find out?

I'd like to be corrected if I am wrong, but the last number (weight) is set to 0 for all domains by default. By giving it a value of 1 you are giving dom0 more CPU. The second to last number is a boolean that decides whether a domain is hard locked to it's weight or if can burst using idle CPU cycles. The 3 before that are generally set to 0 and the first number is the domain name. I do not know of a way to grab the weights personally. It is documented in the Xen distribution tgz.

I can tell you the symptoms I had: whenever a process in dom0 grabs 100% of the CPU, the domU console freezes. After a little while, the domU console says "BUG: soft lockup detected on CPU#0!" So I believe that with my default settings, dom0 always gets first priority, and domU gets the leftovers.


For those that have just seen this (this thread started on xen- users): I had very poor UDP performance using iperf with domU as the server and dom0 as the client. I had 99.98% packet loss when running at 90Mb/s in this case, until I changed the scheduling, as above. Then packet loss dropped to 0. In the reverse direction there was never a problem.

For more details, see the original thread here:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00096.html

It's possible that iperf is partially at fault here. (I used version 1.7.0 since 2.0.2 wouldn't compile on my iBook.) I noticed that it takes 100% of CPU time when it's used as a UDP client, even when running at lower speeds -- I saw this at 4Mb/s. I would wager that it uses a while loop to delay between sending datagrams. Since iperf always wants all the CPU cycles and because domU has last priority in my default scheduling config, domU just wouldn't get enough CPU time to process the incoming datagrams.

A more general note about using iperf:
It seems to me that as long as iperf uses 100% of the CPU, it is not a good tool for testing dom0-domU or domU-domU network performance. This sort of timing loop would be fine for network tests using "real" hosts, but not ones in which CPU resources are shared and network I/O is CPU-bound, as is the case here.

I would guess that this would not occur on SMP machines (and maybe hyperthreaded ones also), since iperf's timing loop wouly use up only one CPU.


The other network issue I had was very slow TCP performance when domU was the iperf server and an external machine was the iperf client. I had 2 Mb/s in this case, but about 90Mb/s in the other direction (on 100Mbit ethernet). This problem disappeared when I did the scheduling change above.

This issue is _not_ explained by the iperf hogging the CPU as I mentioned above. No user-level process in dom0 should be involved; dom0 just does some low-level networking. But if the cause of this TCP problem is that dom0 is taking all the CPU resources, then that would suggest that somewhere in the xen networking/bridging code, it is getting 100% CPU time, just to do bridging for the incoming data. Does this indicate a problem in the networking code?

Again, the TCP slowness does not occur in the reverse direction, when domU is sending to an external machine. My guess is that, like the iperf UDP issue above, that this problem would not occur on SMP machines.


--Winston


I ran my own tests. I have dom0 with a weight of 512 (double it's memory allocation) and each VM also has a weight equal to it's memory allocation. My dom0 can transfer at 10MB/s+ over the LAN, but domU's with 100% CPU used on the host could only transfer over the LAN at a peak of 800KB/s. When I gave dom0 a weight of 1 domU transfers decreased to a peak of 100KB/s over the "LAN" (quoted because due to proxy ARP the host acts as a router)

The problem does not matter if you use bridged or routed mode.

I would have to believe the problem is in the hypervisor itself and scheduling and CPU usage greatly affect it. Network bandwidth should not be affected unless wanted (ie. by using the rate vif parameter).

Stephen Soltesz has experienced the same problem and has some graphs to back it up. Stephen, will you share at least that one CPU + IPerf graph with the community and perhaps elaborate on your weight configuration (if any).

Thank you,
Matt Ayres


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>