Xen project Mailing List

RE: Use of newer locking primitives...

To: Paul Durrant <xadimgnik@xxxxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>

From: Martin Harvey <martin.harvey@xxxxxxxxxx>

Date: Tue, 30 Aug 2022 12:40:15 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=T4KKKyF3Hd3XH8LETIB6jnrhVLmFmK9Ljp5oDkT0MAU=; b=jKYPshA3skAQRbwuVAkm032udbE3MTwS7we7JNLoHZkQuNH3aO6PForHkvXASVkrzBwviRO2y1cI7iZvNwEnnjfgyG1BUcFkVdMkLwjHdkz4omwdqi5jbhLR3sfM8UYVjNlBIkEwHxqchAjpV+wN461Gnu9tYWMdjBWFPMH+4ckevWvkHNKoALlA8ECmCEBrf+Panxhct3wEYhwp/JRsGE59q7ffsdrlZt/BRa3ynMnKSwB34DjvCFg1f8UeGeEvtiP4w2SwIOtEZwDyTtMOS9HyCKkn/DC6RQU6kVOHalafNAfalaE0H34k1LsohFCfkwcfvVSpN0s+sVXFRrotoQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YXQbZJ7KQxrI9QrlO1ZTQJrdfEUrdfyUJW4GL8Kh/qCjTn6wp5lt/KivewzgXo7WxsvUGvDIy7oIU7w+brEiOy2iNZL6PaNP9Ht+yUNRz8RgG6EDldlmjIBurSn8hool18wrbPN31fvFu+drEcuxmfHBxOBaULSngfHGIV1qd27PwNSiBexLsQ8Q7n1e2H0VWIdTzjovxj0nxZQ9oHT3AbguRCOkMjK+JaeV7i3HSD+4xFK3NzlIa4+niwTXIR0eFsC8FVbtxr9U0rn1g/UiE7eku8zTr8tZW467DDmFVmX6NtmyWcO+VBL0UZpIRU26CNJwSkp/Q0PCKwJmzIdZSQ==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Delivery-date: Tue, 30 Aug 2022 12:40:24 +0000

Ironport-data: A9a23:g4B346LzKUnG+9HXFE+RwJQlxSXFcZb7ZxGr2PjKsXjdYENShjADm 2BMWGGFPviIZmf3Lth+b9u/oE4D7ZLdx4dnT1BlqX01Q3x08seUXt7xwmUcns+xwm8vaGo9s q3yv/GZdJhcokf0/0vraP65xZVF/fngbqLmD+LZMTxGSwZhSSMw4TpugOd8iYNz6TSDK1rlV eja/ouOYzdJ5xYuajhOs/LY8Es21BjPkGhwUmIWNKgjUGD2zxH5PLpHTYmtIn3xRJVjH+LSb 44vG5ngows1Vz90Yj+Uuu6Tnn8iG9Y+DiDX4pZiYICwgwAqm8AH+v1T2Mzwy6tgo27hc9hZk L2hvHErIOsjFvWkdO81C3G0H8ziVEHvFXCuzXWX6KSuI0P6n3TEw9ZDFRgzFM4i9ud7O25ez 8ADGjMfR0XW7w626OrTpuhEoO0GdZGuEKZB/3ZqwHfeEOosRo3FT+PS/9hE0Twsh8dIW/HDe 84ebjkpZxPFC/FNEg5PVNRiw6H11j+mK2IwRFG9/MLb50DWxRZt0b6rMNPPZNGbbc5UglyZt iTN+GGR7hQya43Pk2PdoiLEaunnxzrGB4sdPpCCyvNa3QKrwlYfAxwOWg7uyRW+ogvkMz5FE GQE8yYvqKc09U+DQdz0Xhn+q3mB1jYDWtwVC/N/5AyTx6785weCGnNCXjNHcMYhtsI9WXotz FDhoj/yLTlmsbnQRXfD8L6R9Gu2IXJMdTBEYjIYRwwY5dWluJs0kh/EUtdkFuiyk8HxHjbzh TuNqUDSmokusCLC7I3jlXivvt5mjsGhotIdjukPYl+Y0w==

Ironport-hdrordr: A9a23:JLd1DaFU+cXNSnmupLqFVpHXdLJyesId70hD6qkvc3Fom52j/f xGws5x6fatskdoZJhSo6H6BEDmewKWyXcV2/hYAV7GZmXbUQSTXeVfBOfZogEIXheOj9K1tp 0QOZSWaueAamSS5PySiGbXLz9j+qjgzEnCv5a8854Zd3AOV0gW1XYaNu/0KC1LbTgDIaB8OI uX58JBqTblU28QdN6HCn4MWPWGj8HXlbr9CCR2SyIP2U2rt3eF+bT6Gx+X0lM1SDVU24ov9m DDjkjQ+rijifem0RXRvlWjoKi+2eGRhOerNvb8yvT9GQ+cyTpAo74RGYFqiQpF4d1HLmxa1e Uk7S1Qe/iboEmhBF1d6SGdpjUIlgxepkMKgGXo/UfLsIj3Qik3BNFGgp8cehzF61A4tNU5y6 5T2XmF3qAnei8osR6NkuQgbSsa4nacsD4ni6oennZfWYwRZPtYqpEe5lpcFNMFEDjh4I4qHe FyBIWEjcwmOG+yfjTcpC1i0dasVnM8ElOPRVUDoNWc13xTkGpix0UVycQDljML9Y47SZND++ PYW54Y4o1mX4sTd+ZwFe0BScy4BijERg/NKnubJRD9GKQOKxv22uzKCXUOlZKXkbAzveUPcc 76ISxlXEYJCjPTINzL2oFX+RbQR2j4VSjxy6hlluhEhoE=

List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

Thread-index: Adi8ZD8ojTv0cYRiTIesy/wwgsnlOAAAgQyAAABev9AAANxqAAAAa08g

Thread-topic: Use of newer locking primitives...

-----Original Message----- From: Paul Durrant <xadimgnik@xxxxxxxxx> > I wouldn't necessarily be convinced that moving away from the atomic > lock/queue would be any faster. Humm. Maybe depends on H/W cache implementation. If you're doing interlocked ops on the lock, and non-interlocked ops on a queue, and they are on separate cache lines... maybe. > Limiting the queue drain may help from a fairness PoV, which I guess could be > achieved by actually blocking on the ring lock if the queue gets to a certain > length. > - We do a XenBus GntTabRevoke for every single damn fragment individually on > the Tx fast path. There are probably some ways of batching / deferring these > so as to make everything run a bit more smoothly. > A revoke ought to be fairly cheap; it's essentially some bit flips. It does, > of course, involve some memory barriering... but is that really slowing > things down? Well, it does do a SCHEDOP_Yield, where I think perhaps KeStallExecutionProcessor might be better for a shorter (microsecond) wait. If the yield is a hypercall, then I think assorted context switching etc is wasted work, and poss longer wait granularity. It would be *really nice* to get vTune / uProf working on guests with all the uArchitecture counters working so we could actually find out exactly how many cycles get spent where. We see 99% of TxPollDpc's completing rapidly (microseconds), and then very occasionally we get a 4ms PollDpc. This is timing via ETW. If I get time to repro it (cpl of weeks), then I might try changing the stall mechanisms, and possibly the locking, separately. MH.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.