On Apr 5, 2006, at 1:11 PM, Matt Ayres wrote:
Winston Chang wrote:
I ran the test with the latest xen-unstable build. The results
are the same.
When I ran 'xm sched-sedf 0 0 0 0 1 1' to prevent domU CPU
starvation, network performance was good. The numbers in this
case are the same as in my other message where I detail the
results using the week-old xen build -- it could handle 90Mb/s
with no datagram loss. So it looks like the checksum patches
had no effect on this phenomenon; the only thing that mattered
was the scheduling.
What was the previous weight of domain 0? What is the weight
assigned to the domU's and do the domU's have bursting enabled?
I'm not really sure the answer to either of these questions. The
weight is whatever is the default is with Fedora Core 5 and xen-
unstable. I don't know anything about bursting. How do you find out?
I'd like to be corrected if I am wrong, but the last number
(weight) is set to 0 for all domains by default. By giving it a
value of 1 you are giving dom0 more CPU. The second to last number
is a boolean that decides whether a domain is hard locked to it's
weight or if can burst using idle CPU cycles. The 3 before that
are generally set to 0 and the first number is the domain name. I
do not know of a way to grab the weights personally. It is
documented in the Xen distribution tgz.
I can tell you the symptoms I had: whenever a process in dom0 grabs
100% of the CPU, the domU console freezes. After a little while, the
domU console says "BUG: soft lockup detected on CPU#0!" So I believe
that with my default settings, dom0 always gets first priority, and
domU gets the leftovers.
For those that have just seen this (this thread started on xen-
users): I had very poor UDP performance using iperf with domU as the
server and dom0 as the client. I had 99.98% packet loss when running
at 90Mb/s in this case, until I changed the scheduling, as above.
Then packet loss dropped to 0. In the reverse direction there was
never a problem.
For more details, see the original thread here:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00096.html
It's possible that iperf is partially at fault here. (I used version
1.7.0 since 2.0.2 wouldn't compile on my iBook.) I noticed that it
takes 100% of CPU time when it's used as a UDP client, even when
running at lower speeds -- I saw this at 4Mb/s. I would wager that
it uses a while loop to delay between sending datagrams. Since iperf
always wants all the CPU cycles and because domU has last priority in
my default scheduling config, domU just wouldn't get enough CPU time
to process the incoming datagrams.
A more general note about using iperf:
It seems to me that as long as iperf uses 100% of the CPU, it is not
a good tool for testing dom0-domU or domU-domU network performance.
This sort of timing loop would be fine for network tests using "real"
hosts, but not ones in which CPU resources are shared and network I/O
is CPU-bound, as is the case here.
I would guess that this would not occur on SMP machines (and maybe
hyperthreaded ones also), since iperf's timing loop wouly use up only
one CPU.
The other network issue I had was very slow TCP performance when domU
was the iperf server and an external machine was the iperf client. I
had 2 Mb/s in this case, but about 90Mb/s in the other direction (on
100Mbit ethernet). This problem disappeared when I did the
scheduling change above.
This issue is _not_ explained by the iperf hogging the CPU as I
mentioned above. No user-level process in dom0 should be involved;
dom0 just does some low-level networking. But if the cause of this
TCP problem is that dom0 is taking all the CPU resources, then that
would suggest that somewhere in the xen networking/bridging code, it
is getting 100% CPU time, just to do bridging for the incoming data.
Does this indicate a problem in the networking code?
Again, the TCP slowness does not occur in the reverse direction, when
domU is sending to an external machine. My guess is that, like the
iperf UDP issue above, that this problem would not occur on SMP
machines.
--Winston
I ran my own tests. I have dom0 with a weight of 512 (double it's
memory allocation) and each VM also has a weight equal to it's
memory allocation. My dom0 can transfer at 10MB/s+ over the LAN,
but domU's with 100% CPU used on the host could only transfer over
the LAN at a peak of 800KB/s. When I gave dom0 a weight of 1 domU
transfers decreased to a peak of 100KB/s over the "LAN" (quoted
because due to proxy ARP the host acts as a router)
The problem does not matter if you use bridged or routed mode.
I would have to believe the problem is in the hypervisor itself and
scheduling and CPU usage greatly affect it. Network bandwidth
should not be affected unless wanted (ie. by using the rate vif
parameter).
Stephen Soltesz has experienced the same problem and has some
graphs to back it up. Stephen, will you share at least that one
CPU + IPerf graph with the community and perhaps elaborate on your
weight configuration (if any).
Thank you,
Matt Ayres
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|