Thanks for comments! The solution I
present in last mail has an advantage that, it could almost decrease the event
channel notification frequency to zero, which will save a lot of CPU cycle
especially for HVM PV driver.
For James's suggestion, actually we have
another solution which works in that style, see the attachment. We only
modifies the netback, and keeps netfront unchanged. The patch is based on PV-ops
Dom0, so the hrtimer is accurate. We set a timer in netback. If timer elapses or
there are RING_SIZE/2 data slots in ring, netback will notify netfront (Of
course we could modify the 'event' parameter to replace the check of data number
in ring). The patch contains auto adjustment logic for each netfront's event
channel frequency according to packet rate and size in a timer period. Also user
could assign specific timer frequency for a certain netfront by using standard
coalesce interface. If set the
event notification frequency to 1000HZ, it also brings a lot of CPU utilization
decrease like the previous test result. Here are the detail result for the two
solutions. I think the two solutions could coexist, and we can set a MACRO to
indicate which solution is used as default.
Here the w/ FE
patch means that applying the first solution patch attached in my last mail. w/
BE patch means applying the second solution patch attached in this
mail.
VM receive
results:
UDP Receive (Single
Guest VM) |
|
TCP Receive (Single
Guest VM) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
50 |
w/o
patch |
83.25 |
100.00% |
26.10% |
|
50 |
w/o
patch |
506.57 |
43.30% |
70.30% |
w/ FE
patch |
79.56 |
100.00% |
23.80% |
|
w/ FE
patch |
521.52 |
34.50% |
57.70% |
w/ BE
patch |
72.43 |
100.00% |
21.90% |
|
w/ BE
patch |
512.78 |
38.50% |
54.40% |
1472 |
w/o
patch |
950.30 |
44.80% |
22.40% |
|
1472 |
w/o
patch |
926.19 |
69.00% |
32.90% |
w/ FE
patch |
949.32 |
46.00% |
17.90% |
|
w/ FE
patch |
928.23 |
63.00% |
24.40% |
w/ BE
patch |
951.57 |
51.10% |
18.50% |
|
w/ BE
patch |
928.59 |
67.50% |
24.80% |
1500 |
w/o
patch |
915.84 |
84.70% |
42.40% |
|
1500 |
w/o
patch |
935.12 |
68.60% |
33.70% |
w/ FE
patch |
908.94 |
88.30% |
28.70% |
|
w/ FE
patch |
926.11 |
63.80% |
24.80% |
w/ BE
patch |
904.00 |
88.90% |
27.30% |
|
w/ BE
patch |
927.00 |
68.80% |
24.60% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UDP Receive (Three Guest
VMs) |
|
TCP Receive (Three Guest
VMs) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
1472 |
w/o
patch |
963.43 |
50.70% |
41.10% |
|
1472 |
w/o
patch |
939.68 |
78.40% |
64.00% |
w/ FE
patch |
964.47 |
51.00% |
25.00% |
|
w/ FE
patch |
926.04 |
65.90% |
31.80% |
w/ BE
patch |
963.07 |
55.60% |
27.80% |
|
w/ BE
patch |
930.61 |
71.60% |
34.80% |
1500 |
w/o
patch |
859.96 |
99.50% |
73.40% |
|
1500 |
w/o
patch |
933.00 |
78.10% |
63.30% |
w/ FE
patch |
861.19 |
97.40% |
39.90% |
|
w/ FE
patch |
927.14 |
66.90% |
31.90% |
w/ BE
patch |
860.92 |
98.90% |
40.00% |
|
w/ BE
patch |
930.76 |
71.10% |
34.80% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UDP Receive (Six Guest
VMs) |
|
TCP Receive (Six Guest
VMs) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
1472 |
w/o
patch |
978.85 |
56.90% |
59.20% |
|
1472 |
w/o
patch |
962.04 |
90.30% |
104.00% |
w/ FE
patch |
975.05 |
53.80% |
33.50% |
|
w/ FE
patch |
958.94 |
69.40% |
43.70% |
w/ BE
patch |
974.71 |
59.50% |
40.00% |
|
w/ BE
patch |
958.08 |
68.30% |
48.00% |
1500 |
w/o
patch |
886.92 |
100.00% |
87.20% |
|
1500 |
w/o
patch |
960.35 |
90.10% |
103.70% |
w/ FE
patch |
902.02 |
96.90% |
46.00% |
|
w/ FE
patch |
957.75 |
68.70% |
42.80% |
w/ BE
patch |
894.57 |
98.90% |
49.60% |
|
w/ BE
patch |
956.42 |
68.20% |
48.50% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UDP Receive (Nine Guest
VMs) |
|
TCP Receive (Nine Guest
VMs) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
1472 |
w/o
patch |
987.91 |
60.50% |
70.00% |
|
1472 |
w/o
patch |
974.89 |
90.00% |
110.60% |
w/ FE
patch |
988.30 |
56.60% |
42.70% |
|
w/ FE
patch |
980.03 |
73.70% |
55.40% |
w/ BE
patch |
986.58 |
61.80% |
50.00% |
|
w/ BE
patch |
968.29 |
72.30% |
60.20% |
1500 |
w/o
patch |
953.48 |
100.00% |
93.80% |
|
1500 |
w/o
patch |
971.34 |
89.80% |
109.60% |
w/ FE
patch |
904.17 |
98.60% |
53.50% |
|
w/ FE
patch |
973.63 |
73.90% |
54.70% |
w/ BE
patch |
905.52 |
100.00% |
56.80% |
|
w/ BE
patch |
971.08 |
72.30% |
61.00% |
VM send
results:
UDP Send (Single Guest
VM) |
|
TCP Send (Single Guest
VM) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
1472 |
w/o
patch |
949.84 |
56.50% |
21.70% |
|
1472 |
w/o
patch |
932.16 |
71.50% |
35.60% |
w/ FE
patch |
946.25 |
51.20% |
20.10% |
|
w/ FE
patch |
932.09 |
66.90% |
29.50% |
w/ BE
patch |
948.73 |
51.60% |
19.70% |
|
w/ BE
patch |
932.54 |
66.20% |
25.30% |
1500 |
w/o
patch |
912.46 |
87.00% |
26.60% |
|
1500 |
w/o
patch |
929.91 |
72.60% |
35.90% |
w/ FE
patch |
899.29 |
86.70% |
26.20% |
|
w/ FE
patch |
931.63 |
66.70% |
29.50% |
w/ BE
patch |
909.31 |
86.90% |
25.90% |
|
w/ BE
patch |
932.83 |
66.20% |
26.20% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UDP Send (Three Guest
VMs) |
|
TCP Send (Three Guest
VMs) |
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
|
Packet Size
(bytes) |
Test
Case |
Throughput
(Mbps) |
Dom0 CPU
Util |
Guest CPU Total
Util |
1472 |
w/o
patch |
972.66 |
57.60% |
24.00% |
|
1472 |
w/o
patch |
955.92 |
70.40% |
36.10% |
w/ FE
patch |
970.07 |
56.30% |
23.30% |
|
w/ FE
patch |
946.39 |
72.90% |
32.90% |
w/ BE
patch |
971.05 |
59.10% |
23.10% |
|
w/ BE
patch |
949.80 |
70.30% |
33.20% |
1500 |
w/o
patch |
943.87 |
93.50% |
32.50% |
|
1500 |
w/o
patch |
966.06 |
73.00% |
38.00% |
w/ FE
patch |
933.61 |
93.90% |
30.00% |
|
w/ FE
patch |
947.23 |
72.50% |
33.60% |
w/ BE
patch |
937.08 |
95.10% |
31.00% |
|
w/ BE
patch |
948.74 |
72.20% |
34.50% | Best
Regards, -- Dongxiao
-----Original Message----- From: James Harper [mailto:james.harper@xxxxxxxxxxxxxxxx] Sent:
Thursday, September 10, 2009 4:03 PM To: Xu, Dongxiao;
xen-devel@xxxxxxxxxxxxxxxxxxx Subject: RE: [Xen-devel][PATCH][RFC] Using data
polling mechanism in netfront toreplace event notification between netback and
netfront
> Hi, > This is a VNIF
optimization patch, need for your comments. Thanks! > >
[Background]: > One of the VNIF driver's
scalability issues is the high event channel > frequency. It's highly
related to physical NIC's interrupt frequency in dom0, > which could be
20K HZ in some situation. The high frequency event channel >
notification makes the guest and dom0 CPU utilization at a high value. >
Especially for HVM PV driver, it brings high rate of interrupts,
which could > cost a lot of CPU cycle. >
The attached patches have two parts: one part is
for netback, and the > other is for netfront. The netback part is based
on the latest PV-Ops Dom0, > and the netfront part is based on the
2.6.18 HVM unmodified driver. > This patch
uses a timer in netfront to poll the ring instead of event > channel
notification. If guest is transferring data, the timer will start >
working and periodicaly send/receive data from ring. If guest is idle and
no > data is transferring, the timer will stop working automatically.
It will > restart again once there is new data transferring. >
We set a feature flag in xenstore to indicate
whether the > netfront/netback support this feature. If there is only one
side supporting > it, the communication mechanism will fall back to
default, and the new feature > will not be used. The feature is enabled
only when both sides have the flag > set in xenstore. >
One problem is the timer polling frequency. This
netfront part patch is > based on 2.6.18 HVM unmodified driver, and in
that kernel version, guest > hrtimer is not accuracy, so I use the
classical timer. The polling frequency > is 1KHz. If rebase the
netfront part patch to latest pv-ops, we could use > hrtimer
instead. >
I experimented with this in Windows too, but the timer
resolution is too poor. I think you should also look at setting the 'event'
parameter too. The current driver tells the backend to tell it as soon as
there is a single packet ready to be notified (np->rx.sring->rsp_event
= np->rx.rsp_cons + 1), but you could set it to a higher number and
also use the timer, eg "tell me when there are 32 ring slots filled, or
when the timer elapses". That way you should have less problems
with overflows.
Also, I don't think you need to tell the backend to
stop notifying you, just don't set the 'event' field in the frontend and
then RING_PUSH_RESPONSES_AND_CHECK_NOTIFY in the backend will not return
that a notification is
required.
James
netbk_lowdown_evtchn_freq.patch
Description: netbk_lowdown_evtchn_freq.patch
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|