|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8
-----Original Message-----
From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of
Durrant, Paul
Sent: 19 March 2022 17:39
To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8
[CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments
unless you have verified the sender and know the content is safe.
> I think we need a bit more in the commit comment. What is the nature of the
> failure... and does XENNET advertise more than 8 queues, so will the
> situation ever arise? Linux certainly tops out at 8 queues.
I don't believe that XenNet ever advertises more than 8 queues, but that's not
quite the same as supporting more than 8 vCPU's
Perhaps something in the comment like: "Mapping between Queues and VPU's fails
for more than 8 VCPU's because the base of the indirection is always considered
to be zero, and the mapping is always performed on a direct vCPU number basis"
Would that adequately summarise the problem? (It's not easy to explain
succinctly in English!)
MH.
This summary helpfully provided by Edvin Torok.
On a VM with >8 vCPUs RSS might not work because the driver fails to set up the
indirection table.
This causes the VM to only be able to reach 12.4Gbit/s with 'iperf3 -P 8',
instead of 16-18Gbit/s with a working RSS setup:
This can be easily reproduced if you give a VM 32 vCPUs and create 3 network
interfaces. Windows will assign 0-3 to one network interface, 4-7 to next, and
will try 8-12 I think for next but the driver rejects that:
PS C:\Program Files\CItrix\XenTools\Diagnostics> Get-NetAdapterRSS
Name : Ethernet 5
InterfaceDescription : XenServer PV Network Device #2
Enabled : True
NumberOfReceiveQueues : 8
Profile : NUMAStatic
BaseProcessor: [Group:Number] : 0:0
MaxProcessor: [Group:Number] : 0:31
MaxProcessors : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0
0:4/0 0:5/0 0:6/0 0:7/0
0:8/0 0:9/0 0:10/0 0:11/0
0:12/0 0:13/0 0:14/0 0:15/0
0:16/0 0:17/0 0:18/0
0:19/0 0:20/0 0:21/0 0:22/0 0:23/0
0:24/0 0:25/0 0:26/0
0:27/0 0:28/0 0:29/0 0:30/0 0:31/0
IndirectionTable: [Group:Number] :
Name : Ethernet 4
InterfaceDescription : XenServer PV Network Device #1
Enabled : True
NumberOfReceiveQueues : 8
Profile : NUMAStatic
BaseProcessor: [Group:Number] : 0:0
MaxProcessor: [Group:Number] : 0:31
MaxProcessors : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0
0:4/0 0:5/0 0:6/0 0:7/0
0:8/0 0:9/0 0:10/0 0:11/0
0:12/0 0:13/0 0:14/0 0:15/0
0:16/0 0:17/0 0:18/0
0:19/0 0:20/0 0:21/0 0:22/0 0:23/0
0:24/0 0:25/0 0:26/0
0:27/0 0:28/0 0:29/0 0:30/0 0:31/0
IndirectionTable: [Group:Number] : 0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
0:4 0:5 0:6 0:7
Name : Ethernet 3
InterfaceDescription : XenServer PV Network Device #0
Enabled : True
NumberOfReceiveQueues : 8
Profile : NUMAStatic
BaseProcessor: [Group:Number] : 0:0
MaxProcessor: [Group:Number] : 0:31
MaxProcessors : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0
0:4/0 0:5/0 0:6/0 0:7/0
0:8/0 0:9/0 0:10/0 0:11/0
0:12/0 0:13/0 0:14/0 0:15/0
0:16/0 0:17/0 0:18/0
0:19/0 0:20/0 0:21/0 0:22/0 0:23/0
0:24/0 0:25/0 0:26/0
0:27/0 0:28/0 0:29/0 0:30/0 0:31/0
IndirectionTable: [Group:Number] : 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3
There is a builtin hardcoded limit of 8 queues, which is fine, but that should
be completely unrelated to CPU numbers! (the total number of CPUs assigned to a
NIC should be <=8, sure).
Potential code causing issue in xenvif receiver.c:
for (Index = 0; Index < Size; Index++) {
QueueMapping[Index] =
KeGetProcessorIndexFromNumber(&ProcessorMapping[Index]);
if (QueueMapping[Index] >= NumQueues)
goto fail2;
}
(there is also a problem that the code assumes that group number is always 0.
For now that is true, but might change if we implement vNUMA in the future).
Jun 2 09:54:57 prost qemu-dm-41[30818]:
30818@1622627697.450233:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:54:57 prost qemu-dm-41[30818]:
30818@1622627697.450320:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:54:57 prost qemu-dm-41[30818]:
30818@1622627697.452097:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:54:57 prost qemu-dm-41[30818]:
30818@1622627697.452180:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:56:14 prost qemu-dm-41[30818]:
30818@1622627774.374713:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:56:14 prost qemu-dm-41[30818]:
30818@1622627774.374798:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:56:14 prost qemu-dm-41[30818]:
30818@1622627774.377121:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:56:14 prost qemu-dm-41[30818]:
30818@1622627774.377203:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.672941:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.673058:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675891:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675993:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.363892:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.364008:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365861:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365949:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935871:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935965:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937849:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937918:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:05:00 prost qemu-dm-46[11484]:
11484@1622628300.973487:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:05:00 prost qemu-dm-46[11484]:
11484@1622628300.973588:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:05:00 prost qemu-dm-46[11484]:
11484@1622628300.976554:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:05:00 prost qemu-dm-46[11484]:
11484@1622628300.976650:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:22:54 prost qemu-dm-49[21901]:
21901@1622629374.720769:xen_platform_log xen platform:
xenvif|PdoGetInterfaceGuid: fail1 (c0000034)
Jun 2 10:22:55 prost qemu-dm-49[21901]:
21901@1622629375.194122:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:22:55 prost qemu-dm-49[21901]:
21901@1622629375.194231:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:22:55 prost qemu-dm-49[21901]:
21901@1622629375.196726:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:22:55 prost qemu-dm-49[21901]:
21901@1622629375.196825:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:23:38 prost qemu-dm-50[24509]:
24509@1622629418.530046:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:23:38 prost qemu-dm-50[24509]:
24509@1622629418.530115:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:23:38 prost qemu-dm-50[24509]:
24509@1622629418.531811:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:23:38 prost qemu-dm-50[24509]:
24509@1622629418.531888:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:30:28 prost qemu-dm-51[28530]:
28530@1622629828.510968:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:30:28 prost qemu-dm-51[28530]:
28530@1622629828.511050:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:30:28 prost qemu-dm-51[28530]:
28530@1622629828.513570:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:30:28 prost qemu-dm-51[28530]:
28530@1622629828.513691:xen_platform_log xen platform:
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573791:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573904:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576188:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576298:xen_platform_log
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
I tested with both Win10 and Windows Server 2016, with various CPU topologies
(e.g. 12 vCPUs, all on one socket shows same issue once windows starts
assigning CPUs>8)
A workaround is to set MaxProcessorNumber to 7, though obviously this will
limit scalability since vCPUs > 8 won't be used even if you have multiple VIFs:
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |