[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8


  • To: "paul@xxxxxxx" <paul@xxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Martin Harvey <martin.harvey@xxxxxxxxxx>
  • Date: Mon, 21 Mar 2022 16:06:52 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9jVF6ZxvD3ef0GPYq2jYETcJmU0msUnALv1Xy2TqPbc=; b=U9//e/Btls+XI964sAmG/EbI4RUWriNxHnie3DvYCK6AjevNYS2zGcK9QWoMBladvjpje4UZRfQIckg51ahWFnKBcFDt0YRTAy9+aQO3HD5lq3pt7czgEf6KNYNp/gRc7EFZuu6QvXkPU/cJ3XvPcpJeAfRp7UQX23cOYIuRusD78xkciC/zKANCUlknH4GL9cfm8IXZLoNIltF35zvh+7PWPklaQkCO9xQJ2GbR6oacMUu96piwSJqe0uRR7u+uJ/zSoiktjyGdZz/Rj3Q7kUDe7RTRyIck90rqa8DMvHdiCWKIkZPH7kffNE6iLyE5LirpjMIyYznkvkkHcz2Ikg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d9GLdOfDe/mckvdg0z2vqmCDHUevY3MNi8Ya9ZnCTVlNV/6h06h8IL27aM1AMrwkn//3yiY3Pl8wzo5pb5EIUOpTsjndJYI/VJoEnynil7jhWnL0RnRSW1UrNB39PvCmKAck4C2SIpzsPc4CSN4G5mXQyKw+lgcN+tx/72HPK/P9wjenMwo8ur2oa4Kyo7VAJpLYzMavt6pywoUEWYQ8AWHLZ+D1YPtfTlsmoScWPC6J8E5MzOOhe7UqXjhDQEx9gOmtL/fEKsavcFUn/j5Ky9dkcoP8plFn4g1t5eDaFelzbkChikc6YcHhGRD0NLQO8khttHURWsNSf+sLISAdNA==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Edwin Torok <edvin.torok@xxxxxxxxxx>
  • Delivery-date: Mon, 21 Mar 2022 16:07:06 +0000
  • Ironport-data: A9a23:7uP/mq2xNrlZ5gfDb/bD5dNxkn2cJEfYwER7XKvMYLTBsI5bpzcOz WROC27QP/eOMWWmet13bYzi80wHupPQzIVjSFM9pC1hF35El5HIVI+TRqvS04J+DSFhoGZPt Zh2hgzodZhsJpPkjk7xdOCn9xGQ7InQLlbGILes1htZGEk1EE/NtTo5w7Rj2tUy2IDja++wk YiaT/P3aQfNNwFcagr424rbwP+4lK2v0N+wlgVWicFj5DcypVFMZH4sDfjZw0/DaptVBoaHq 9Prl9lVyI97EyAFUbtJmp6jGqEDryW70QKm0hK6UID66vROS7BbPg/W+5PwZG8O4whlkeydx /1K5JvoQFcXb5HcleA+elpgTzF7PKxJreqvzXiX6aR/zmXDenrohf5vEFs3LcsT/eMf7WNmr KJCbmpXN1ba2rzwkOnTpupE36zPKOHuNZkDu3cmzTjDE/s3aZvCX7/L9ZlT2zJYasVmQ6iBN 5RBOGYHgBLoZgRwOQpQNtEEwe6nnUbTfxB+r3a6jP9ii4TU5FMoi+W8WDbPQfSVRMMQhljdq m/Y8mDRBhABKMfZ2TeD6mirhOLEgWX8Qo16PKWx6/pCkFCVgGsJB3UruUCT+KfjzBTkApQGd hJSqnFGQbUOGFKDSYj6BxToni+4rhMYePMTK7Eq7QCMxf+Bi+qGPVQsQjlEYd0gkcY5Qz02y 1OE9+/U6SxTXK69EizEqOrNxd+mEW1MdDJZO3dYJecQy4O7yLzfmC4jWTqK/ESdqtTuUQ/9z DmRxMTVr+VC1JVbv0lXEL2uvt5NmnQrZlNujuk0djj8hu+cWGJCT9b0gWU3Fd4acO6koqO9l HYFgdOCy+sFEIuAkieAKM1UQu32t6/daWSM2AU+d3XEy9hK0yT+Fb28HRkkfBs5Wir6UWGBj LDvVfN5u8YIYSrCgV5faIOtEcU6pZUM5vy+Ps04muFmO8ArHCfepXkGTRfJgwjFzRh9+Ylia MzzWZv9Uh4n5VFPkWPeb/0DyoUi2i1W7TqVHfgXOTz8iuHADJNUIJ9YWGazghcRt/rV/l+Kr 44Ba6NnCXx3CYXDX8UeyqZKRXgiJnknH5Hm7ctRc++IOA19H289TfTWxNscl0ZNxMy5Ss+gE qmBZ3Jl
  • Ironport-hdrordr: A9a23:T9RSy6sXBY2wHS9+/h+IrQyF7skCxIMji2hC6mlwRA09TyXGra 6TdaUguiMc1gx8ZJh5o6H9BEGBKUmskaKdkrNhQotKOzOW91dATbsSobcKpgeAJ8SQzJ8k6U 4NSdkdNDS0NykGsS+Y2nj3Lz9D+qj9zEnAv463pB0BLXAIV0gj1XYCNu/yKDwveOAsP+tfKH Po3Ls/m9PWQwVwUi3UPAhhY8Hz4/nw0L72ax8PABAqrCOUiymz1bL8Gx+Emj8DTjJm294ZgC n4uj28wp/mn+Cwyxfa2WOWxY9RgsHdxtxKA9HJotQJKw/rlh2jaO1aKv2/VXEO0aKSAWQR4Z zxSiQbToBOArTqDyaISC7WqkvdOfAVmjnfIBGj8CLeSIfCNUwH4oJ69PNkm13imhEdVZhHod J29nPcuJxNARzamiPho9DOShFxj0Kx5WEviOgJkhVkIMAjgZJq3MUiFXluYd899ePBmfQaOf grCNuZ6OddcFucYXyctm5zwMa0VnB2GhudWEANtsGczjATxRlCvgYl7d1amm1F+IM2SpFC6e iBOqN0lKtWRstTaa5mHu8OTca+F2SISxPRN2CZJ0jhCcg8Sjjwgo+y5K9w6PCheZQOwpd3kJ PdUElAvWp3YE7qAd3m5uw8zvkMehTLYd3A8LAt23EigMyOeFPCC1z3dGwT
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
  • Thread-index: AQHYN7gtXmzUE2l3x0isfDYfnRWjD6zHAReAgAMJTjA=
  • Thread-topic: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8


-----Original Message-----
From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of 
Durrant, Paul
Sent: 19 March 2022 17:39
To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] Windows PV drivers fail to set up RSS when vCPUs > 8

[CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
unless you have verified the sender and know the content is safe.


> I think we need a bit more in the commit comment. What is the nature of the 
> failure... and does XENNET advertise more than 8 queues, so will the 
> situation ever arise? Linux certainly tops out at 8 queues.

I don't believe that XenNet ever advertises more than 8 queues, but that's not 
quite the same as supporting more than 8 vCPU's

Perhaps something in the comment like: "Mapping between Queues and VPU's fails 
for more than 8 VCPU's because the base of the indirection is always considered 
to be zero, and the mapping is always performed on a direct vCPU number basis"

Would that adequately summarise the problem? (It's not easy to explain 
succinctly in English!)

MH.

This summary helpfully provided by Edvin Torok.


On a VM with >8 vCPUs RSS might not work because the driver fails to set up the 
indirection table.

This causes the VM to only be able to reach 12.4Gbit/s with 'iperf3 -P 8', 
instead of 16-18Gbit/s with a working RSS setup:

This can be easily reproduced if you give a VM 32 vCPUs and create 3 network 
interfaces. Windows will assign 0-3 to one network interface, 4-7 to next, and 
will try 8-12 I think for next but the driver rejects that:

PS C:\Program Files\CItrix\XenTools\Diagnostics> Get-NetAdapterRSS


Name                                            : Ethernet 5
InterfaceDescription                            : XenServer PV Network Device #2
Enabled                                         : True
NumberOfReceiveQueues                           : 8
Profile                                         : NUMAStatic
BaseProcessor: [Group:Number]                   : 0:0
MaxProcessor: [Group:Number]                    : 0:31
MaxProcessors                                   : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0  0:1/0  0:2/0  0:3/0  
0:4/0  0:5/0  0:6/0  0:7/0
                                                  0:8/0  0:9/0  0:10/0  0:11/0  
0:12/0  0:13/0  0:14/0  0:15/0
                                                  0:16/0  0:17/0  0:18/0  
0:19/0  0:20/0  0:21/0  0:22/0  0:23/0
                                                  0:24/0  0:25/0  0:26/0  
0:27/0  0:28/0  0:29/0  0:30/0  0:31/0
IndirectionTable: [Group:Number]                :

Name                                            : Ethernet 4
InterfaceDescription                            : XenServer PV Network Device #1
Enabled                                         : True
NumberOfReceiveQueues                           : 8
Profile                                         : NUMAStatic
BaseProcessor: [Group:Number]                   : 0:0
MaxProcessor: [Group:Number]                    : 0:31
MaxProcessors                                   : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0  0:1/0  0:2/0  0:3/0  
0:4/0  0:5/0  0:6/0  0:7/0
                                                  0:8/0  0:9/0  0:10/0  0:11/0  
0:12/0  0:13/0  0:14/0  0:15/0
                                                  0:16/0  0:17/0  0:18/0  
0:19/0  0:20/0  0:21/0  0:22/0  0:23/0
                                                  0:24/0  0:25/0  0:26/0  
0:27/0  0:28/0  0:29/0  0:30/0  0:31/0
IndirectionTable: [Group:Number]                : 0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7
                                                  0:4   0:5     0:6     0:7     
0:4     0:5     0:6     0:7

Name                                            : Ethernet 3
InterfaceDescription                            : XenServer PV Network Device #0
Enabled                                         : True
NumberOfReceiveQueues                           : 8
Profile                                         : NUMAStatic
BaseProcessor: [Group:Number]                   : 0:0
MaxProcessor: [Group:Number]                    : 0:31
MaxProcessors                                   : 4
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0  0:1/0  0:2/0  0:3/0  
0:4/0  0:5/0  0:6/0  0:7/0
                                                  0:8/0  0:9/0  0:10/0  0:11/0  
0:12/0  0:13/0  0:14/0  0:15/0
                                                  0:16/0  0:17/0  0:18/0  
0:19/0  0:20/0  0:21/0  0:22/0  0:23/0
                                                  0:24/0  0:25/0  0:26/0  
0:27/0  0:28/0  0:29/0  0:30/0  0:31/0
IndirectionTable: [Group:Number]                : 0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
                                                  0:0   0:1     0:2     0:3     
0:0     0:1     0:2     0:3
There is a builtin hardcoded limit of 8 queues, which is fine, but that should 
be completely unrelated to CPU numbers! (the total number of CPUs assigned to a 
NIC should be <=8, sure).

Potential code causing issue in xenvif receiver.c:

 for (Index = 0; Index < Size; Index++) {
          QueueMapping[Index] = 
KeGetProcessorIndexFromNumber(&ProcessorMapping[Index]);
          if (QueueMapping[Index] >= NumQueues)
              goto fail2;
      }

(there is also a problem that the code assumes that group number is always 0. 
For now that is true, but might change if we implement vNUMA in the future).

Jun  2 09:54:57 prost qemu-dm-41[30818]: 
30818@1622627697.450233:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:54:57 prost qemu-dm-41[30818]: 
30818@1622627697.450320:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:54:57 prost qemu-dm-41[30818]: 
30818@1622627697.452097:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:54:57 prost qemu-dm-41[30818]: 
30818@1622627697.452180:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:56:14 prost qemu-dm-41[30818]: 
30818@1622627774.374713:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:56:14 prost qemu-dm-41[30818]: 
30818@1622627774.374798:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:56:14 prost qemu-dm-41[30818]: 
30818@1622627774.377121:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:56:14 prost qemu-dm-41[30818]: 
30818@1622627774.377203:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.672941:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.673058:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675891:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:59:00 prost qemu-dm-42[1106]: 1106@1622627940.675993:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.363892:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.364008:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365861:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 09:59:39 prost qemu-dm-43[4074]: 4074@1622627979.365949:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935871:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.935965:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937849:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:04:41 prost qemu-dm-45[9705]: 9705@1622628281.937918:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:05:00 prost qemu-dm-46[11484]: 
11484@1622628300.973487:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:05:00 prost qemu-dm-46[11484]: 
11484@1622628300.973588:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:05:00 prost qemu-dm-46[11484]: 
11484@1622628300.976554:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:05:00 prost qemu-dm-46[11484]: 
11484@1622628300.976650:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:22:54 prost qemu-dm-49[21901]: 
21901@1622629374.720769:xen_platform_log xen platform: 
xenvif|PdoGetInterfaceGuid: fail1 (c0000034)
Jun  2 10:22:55 prost qemu-dm-49[21901]: 
21901@1622629375.194122:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:22:55 prost qemu-dm-49[21901]: 
21901@1622629375.194231:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:22:55 prost qemu-dm-49[21901]: 
21901@1622629375.196726:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:22:55 prost qemu-dm-49[21901]: 
21901@1622629375.196825:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:23:38 prost qemu-dm-50[24509]: 
24509@1622629418.530046:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:23:38 prost qemu-dm-50[24509]: 
24509@1622629418.530115:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:23:38 prost qemu-dm-50[24509]: 
24509@1622629418.531811:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:23:38 prost qemu-dm-50[24509]: 
24509@1622629418.531888:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:30:28 prost qemu-dm-51[28530]: 
28530@1622629828.510968:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:30:28 prost qemu-dm-51[28530]: 
28530@1622629828.511050:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:30:28 prost qemu-dm-51[28530]: 
28530@1622629828.513570:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:30:28 prost qemu-dm-51[28530]: 
28530@1622629828.513691:xen_platform_log xen platform: 
xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573791:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.573904:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)
Jun  2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576188:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail2
Jun  2 10:45:43 prost qemu-dm-52[2889]: 2889@1622630743.576298:xen_platform_log 
xen platform: xenvif|ReceiverUpdateHashMapping: fail1 (c000000d)

I tested with both Win10 and Windows Server 2016, with various CPU topologies 
(e.g. 12 vCPUs, all on one socket shows same issue once windows starts 
assigning CPUs>8)

A workaround is to set MaxProcessorNumber to 7, though obviously this will 
limit scalability since vCPUs > 8 won't be used even if you have multiple VIFs:




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.