Hi,
This is a VNIF optimization patch, need for your comments. Thanks!
[Background]:
One of the VNIF driver's scalability issues is the high event channel
frequency. It's highly related to physical NIC's interrupt frequency in dom0,
which could be 20K HZ in some situation. The high frequency event channel
notification makes the guest and dom0 CPU utilization at a high value.
Especially for HVM PV driver, it brings high rate of interrupts, which could
cost a lot of CPU cycle.
The attached patches have two parts: one part is for netback, and the
other is for netfront. The netback part is based on the latest PV-Ops Dom0, and
the netfront part is based on the 2.6.18 HVM unmodified driver.
This patch uses a timer in netfront to poll the ring instead of event
channel notification. If guest is transferring data, the timer will start
working and periodicaly send/receive data from ring. If guest is idle and no
data is transferring, the timer will stop working automatically. It will
restart again once there is new data transferring.
We set a feature flag in xenstore to indicate whether the
netfront/netback support this feature. If there is only one side supporting it,
the communication mechanism will fall back to default, and the new feature will
not be used. The feature is enabled only when both sides have the flag set in
xenstore.
One problem is the timer polling frequency. This netfront part patch is
based on 2.6.18 HVM unmodified driver, and in that kernel version, guest
hrtimer is not accuracy, so I use the classical timer. The polling frequency is
1KHz. If rebase the netfront part patch to latest pv-ops, we could use hrtimer
instead.
[Testing Result]:
We used a 4-core Intel Q9550 to do the test. From below we can see
that, the test cases include the combination of 1/3/6/9 VMs (all the VMs are UP
guest), 50/1472/1500 packets size, we mesured the throughput, Dom0 CPU
utilization, and guest total CPU utiliztion. We pinned each guest vcpu with one
pcpu, and pinned dom0's vcpu with one pcpu. Take 9 guest VMs case as an
example, we pinned Dom0's vcpu with pcpu0, and pinned guest 1~3 vcpu with
pcpu1, guest 4~6 vcpu with pcpu2, and guest 7~9 vcpu with pcpu3. We use netperf
to do the test, and these results are based on HVM VNIF driver. The packets
size means: 50 (small packets size), 1472 (big packets size, near MTU), 1500
(packets mixed with big size and small size). From below chart we could see
that, host/guest CPU utilization decreases after applying the patch, especially
when multiple VM is launched, while the performance is not impacted.
VM RECEIVE CASES:
Guest UDP Receive (Single Guest VM)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
50 w/o patch 83.25 100.00%
26.10%
50 w/ patch 79.56 100.00%
23.80%
1472 w/o patch 950.30 44.80%
22.40%
1472 w/ patch 949.32 46.00%
17.90%
1500 w/o patch 915.84 84.70%
42.40%
1500 w/ patch 908.94 88.30%
28.70%
Guest TCP Receive (Single Guest VM)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
50 w/o patch 506.57 43.30%
70.30%
50 w/ patch 521.52 34.50%
57.70%
1472 w/o patch 926.19 69.00%
32.90%
1472 w/ patch 928.23 63.00%
24.40%
1500 w/o patch 935.12 68.60%
33.70%
1500 w/ patch 926.11 63.80%
24.80%
Guest UDP Receive (Three Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 963.43 50.70%
41.10%
1472 w/ patch 964.47 51.00%
25.00%
1500 w/o patch 859.96 99.50%
73.40%
1500 w/ patch 861.19 97.40%
39.90%
Guest TCP Receive (Three Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 939.68 78.40%
64.00%
1472 w/ patch 926.04 65.90%
31.80%
1500 w/o patch 933.00 78.10%
63.30%
1500 w/ patch 927.14 66.90%
31.90%
Guest UDP Receive (Six Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 978.85 56.90%
59.20%
1472 w/ patch 975.05 53.80%
33.50%
1500 w/o patch 886.92 100.00%
87.20%
1500 w/ patch 902.02 96.90%
46.00%
Guest TCP Receive (Six Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 962.04 90.30%
104.00%
1472 w/ patch 958.94 69.40%
43.70%
1500 w/o patch 960.35 90.10%
103.70%
1500 w/ patch 957.75 68.70%
42.80%
Guest UDP Receive (Nine Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 987.91 60.50%
70.00%
1472 w/ FE patch 988.30 56.60%
42.70%
1500 w/o patch 953.48 100.00%
93.80%
1500 w/ FE patch 904.17 98.60%
53.50%
Guest TCP Receive (Nine Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 974.89 90.00%
110.60%
1472 w/ patch 980.03 73.70%
55.40%
1500 w/o patch 971.34 89.80%
109.60%
1500 w/ patch 973.63 73.90%
54.70%
VM SEND CASES:
Guest UDP Send (Single Guest VM)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 949.84 56.50%
21.70%
1472 w/ patch 946.25 51.20%
20.10%
1500 w/o patch 912.46 87.00%
26.60%
1500 w/ patch 899.29 86.70%
26.20%
Guest TCP Send (Single Guest VM)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 932.16 71.50%
35.60%
1472 w/ patch 932.09 66.90%
29.50%
1500 w/o patch 929.91 72.60%
35.90%
1500 w/ patch 931.63 66.70%
29.50%
Guest UDP Send (Three Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 972.66 57.60%
24.00%
1472 w/ patch 970.07 56.30%
23.30%
1500 w/o patch 943.87 93.50%
32.50%
1500 w/ patch 933.61 93.90%
30.00%
Guest TCP Send (Three Guest VMs)
Packet Size (bytes) Test Case Throughput (Mbps) Dom0 CPU Util
Guest CPU Total Util
1472 w/o patch 955.92 70.40%
36.10%
1472 w/ patch 946.39 72.90%
32.90%
1500 w/o patch 966.06 73.00%
38.00%
1500 w/ patch 947.23 72.50%
33.60%
Best Regards,
-- Dongxiao
netback.patch
Description: netback.patch
netfront.patch
Description: netfront.patch
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|