[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8
-----Original Message----- From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Durrant, Paul Sent: 19 March 2022 17:39 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8 [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. > I think we need a bit more in the commit comment. What is the nature of the > failure... and does XENNET advertise more than 8 queues, so will the > situation ever arise? Linux certainly tops out at 8 queues. I don't believe that XenNet ever advertises more than 8 queues, but that's not quite the same as supporting more than 8 vCPU's Perhaps something in the comment like: "Mapping between Queues and VPU's fails for more than 8 VCPU's because the base of the indirection is always considered to be zero, and the mapping is always performed on a direct vCPU number basis" Would that adequately summarise the problem? (It's not easy to explain succinctly in English!) MH. This summary helpfully provided by Edvin Torok. On a VM with >8 vCPUs RSS might not work because the driver fails to set up the indirection table. This causes the VM to only be able to reach 12.4Gbit/s with 'iperf3 -P 8', instead of 16-18Gbit/s with a working RSS setup: This can be easily reproduced if you give a VM 32 vCPUs and create 3 network interfaces. Windows will assign 0-3 to one network interface, 4-7 to next, and will try 8-12 I think for next but the driver rejects that: PS C:\Program Files\CItrix\XenTools\Diagnostics> Get-NetAdapterRSS Name : Ethernet 5 InterfaceDescription : XenServer PV Network Device #2 Enabled : True NumberOfReceiveQueues : 8 Profile : NUMAStatic BaseProcessor: [Group:Number] : 0:0 MaxProcessor: [Group:Number] : 0:31 MaxProcessors : 4 RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0 0:4/0 0:5/0 0:6/0 0:7/0 0:8/0 0:9/0 0:10/0 0:11/0 0:12/0 0:13/0 0:14/0 0:15/0 0:16/0 0:17/0 0:18/0 0:19/0 0:20/0 0:21/0 0:22/0 0:23/0 0:24/0 0:25/0 0:26/0 0:27/0 0:28/0 0:29/0 0:30/0 0:31/0 IndirectionTable: [Group:Number] : Name : Ethernet 4 InterfaceDescription : XenServer PV Network Device #1 Enabled : True NumberOfReceiveQueues : 8 Profile : NUMAStatic BaseProcessor: [Group:Number] : 0:0 MaxProcessor: [Group:Number] : 0:31 MaxProcessors : 4 RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0 0:4/0 0:5/0 0:6/0 0:7/0 0:8/0 0:9/0 0:10/0 0:11/0 0:12/0 0:13/0 0:14/0 0:15/0 0:16/0 0:17/0 0:18/0 0:19/0 0:20/0 0:21/0 0:22/0 0:23/0 0:24/0 0:25/0 0:26/0 0:27/0 0:28/0 0:29/0 0:30/0 0:31/0 IndirectionTable: [Group:Number] : 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 0:4 0:5 0:6 0:7 Name : Ethernet 3 InterfaceDescription : XenServer PV Network Device #0 Enabled : True NumberOfReceiveQueues : 8 Profile : NUMAStatic BaseProcessor: [Group:Number] : 0:0 MaxProcessor: [Group:Number] : 0:31 MaxProcessors : 4 RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0 0:4/0 0:5/0 0:6/0 0:7/0 0:8/0 0:9/0 0:10/0 0:11/0 0:12/0 0:13/0 0:14/0 0:15/0 0:16/0 0:17/0 0:18/0 0:19/0 0:20/0 0:21/0 0:22/0 0:23/0 0:24/0 0:25/0 0:26/0 0:27/0 0:28/0 0:29/0 0:30/0 0:31/0 IndirectionTable: [Group:Number] : 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3 There is a builtin hardcoded limit of 8 queues, which is fine, but that should be completely unrelated to CPU numbers! (the total number of CPUs assigned to a NIC should be <=8, sure). Potential code causing issue in xenvif receiver.c: for (Index = 0; Index < Size; Index++) { QueueMapping[Index] = KeGetProcessorIndexFromNumber(&ProcessorMapping[Index]); if (QueueMapping[Index] >= NumQueues) goto fail2; } (there is also a problem that the code assumes that group number is always 0. For now that is true, but might change if we implement vNUMA in the future). Jun 2 09:54:57 prost qemu-dm-41[30818]: 30818@1622627697.450233:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:54:57 prost qemu-dm-41[30818]: 30818@1622627697.450320:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:54:57 prost qemu-dm-41[30818]: 30818@1622627697.452097:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:54:57 prost qemu-dm-41[30818]: 30818@1622627697.452180:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:56:14 prost qemu-dm-41[30818]: 30818@1622627774.374713:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:56:14 prost qemu-dm-41[30818]: 30818@1622627774.374798:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:56:14 prost qemu-dm-41[30818]: 30818@1622627774.377121:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:56:14 prost qemu-dm-41[30818]: 30818@1622627774.377203:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.672941:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.673058:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675891:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675993:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.363892:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.364008:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365861:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365949:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935871:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935965:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937849:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937918:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:05:00 prost qemu-dm-46[11484]: 11484@1622628300.973487:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:05:00 prost qemu-dm-46[11484]: 11484@1622628300.973588:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:05:00 prost qemu-dm-46[11484]: 11484@1622628300.976554:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:05:00 prost qemu-dm-46[11484]: 11484@1622628300.976650:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:22:54 prost qemu-dm-49[21901]: 21901@1622629374.720769:xen_platform_log xen platform: xenvif|PdoGetInterfaceGuid: fail1 (c0000034) Jun 2 10:22:55 prost qemu-dm-49[21901]: 21901@1622629375.194122:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:22:55 prost qemu-dm-49[21901]: 21901@1622629375.194231:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:22:55 prost qemu-dm-49[21901]: 21901@1622629375.196726:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:22:55 prost qemu-dm-49[21901]: 21901@1622629375.196825:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:23:38 prost qemu-dm-50[24509]: 24509@1622629418.530046:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:23:38 prost qemu-dm-50[24509]: 24509@1622629418.530115:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:23:38 prost qemu-dm-50[24509]: 24509@1622629418.531811:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:23:38 prost qemu-dm-50[24509]: 24509@1622629418.531888:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:30:28 prost qemu-dm-51[28530]: 28530@1622629828.510968:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:30:28 prost qemu-dm-51[28530]: 28530@1622629828.511050:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:30:28 prost qemu-dm-51[28530]: 28530@1622629828.513570:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:30:28 prost qemu-dm-51[28530]: 28530@1622629828.513691:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573791:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573904:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576188:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail2 Jun 2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576298:xen_platform_log xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d) I tested with both Win10 and Windows Server 2016, with various CPU topologies (e.g. 12 vCPUs, all on one socket shows same issue once windows starts assigning CPUs>8) A workaround is to set MaxProcessorNumber to 7, though obviously this will limit scalability since vCPUs > 8 won't be used even if you have multiple VIFs:
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |