Xen project Mailing List

Re: [PATCH v5 03/15] IOMMU/x86: support freeing of pagetables

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 2 Jun 2022 10:57:25 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JyK9lpuqccAyMNRDPPVpi2RaPYRe2kccwtZ553nC4iQ=; b=frm0V4PiNYqah1pfKdtl11qpDFJeQ0K2r50VsJ6VhO0S5LzhB8iC6Chf8Ji6WKZoZ9MgdWKZ9qfCX96bFHyh5/3haZC7RJPjuqpNg5mvvyruHQKRBmxI8Ek8uBeIprtiIs5hDG0LPer7jgm2CZDXQNogIqS8tFKiEpdZf219jCXJbBkdcN13QrDO7YebnM01s5ueGyZG4XxF2UlvJPifeNLmqjUU3NcifIzF/sk8e4meCv+kKO8cwdTOL2KhJdL3fg9V/MQagF+g13PTnsdhwjdHs6l56CrvwmHMy3/8p63fG1UP+2HeZxxYvqDsLaZidfd62MldzDWkQHyoda5Caw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jTa+OwJb2x+JWQd2oWr9Q3T9B4XFNA2AYtZFRaTVPQ8HB2zJ1OFe347+Gd8WWLwu+nR5J6IQ0kWaOcPvdGWrYmwppMqfjLvO3hlHx3DLZMFSdiIzNRHQJCucsv63EQLmnqLOEp9V6F+oKqDcJPbk96GY9PNXl91aadj+Q7zjW5lrfdHWzTX0CjC7i7UPBQvsDLImo8muOYfXBgg/yumiz5t2z4nYK4hzHu2UElnUHKqpUR3+EsT6uLlKZWBQhOvbGA6kJd+zPW0tll8I+7DwMuEqmcC8utkf2HfRTyw2E42eEDCal0Vog3ye/Xs3CSsMMLRlgAO2q0qpbuuJjlnYDA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>

Delivery-date: Thu, 02 Jun 2022 08:57:41 +0000

Ironport-data: A9a23:0jNtVKn6Fio22Vi5R82guKjo5gz3J0RdPkR7XQ2eYbSJt1+Wr1Gzt xIaWWyOaP7famSgft9+O9u0pE9QuZPRx9VhTQZk/CpkFSMWpZLJC+rCIxarNUt+DCFioGGLT Sk6QoOdRCzhZiaE/n9BCpC48T8kk/vgqoPUUIYoAAgoLeNfYHpn2EsLd9IR2NYy24DnW1rV5 bsenuWEULOb828sWo4rw/rrRCNH5JwebxtB4zTSzdgS1LPvvyF94KA3fMldHFOhKmVgJcaoR v6r8V2M1jixEyHBqD+Suu2TnkUiGtY+NOUV45Zcc/DKbhNq/kTe3kunXRa1hIg+ZzihxrhMJ NtxWZOYaQguLKvKnsUhSEMFHC8vHvFi8eaXCC3q2SCT5xWun3rE5dxLVRlzEahGv+F9DCdJ6 OASLy0LYlabneWqzbmnS+5qwMM+MM3sO4BZsXZlpd3bJa9+HdafHOOXtZkBhGZYasNmRJ4yY +IDbjVidlLYagBnMVYLEpMu2uyvgxETdhUH8g3N//NmugA/yiRWjJn2OuPEJOaHVJx5pFiF+ 2vC1SfAV0Ry2Nu3jGDtHmiXru3FkD7/WYkSPKal7fMsi1qWrkQDBRtTWValrP2Rjk+lR8kZO 0ES4jApr6U56AqsVNaVdwWxvXqsrhMaHd1KHIUHBBqlz6PV50OVAzYCRzsYMNg+7pZuFHoty 0ODmM7vCXp3qrqJRHmB97CS6zSvJSwSKmxEbigBJecY3+TeTEgIpkqnZr5e/GSd17UZxRmYL +i2kRUD

Ironport-hdrordr: A9a23:pH7UPqqBhrxVScp0APPyzc4aV5u5L9V00zEX/kB9WHVpm5Oj+v xGzc5w6farsl0ssREb9uxo9pPwJE800aQFmbX5Wo3SJzUO2VHYVb2KiLGP/9SOIU3DH4JmpM Rdmu1FeafN5DtB/LnHCWuDYrEdKbC8mcjH5Ns2jU0dKz2CA5sQkzuRYTzrdnGeKjM2Z6bQQ/ Gnl7d6TnebCD0qR/X+IkNAc/nIptXNmp6jSRkaByQ/4A3LqT+z8rb1HzWRwx9bClp0sPwf2F mAtza8yrSosvm9xBOZ/2jP765OkN+k7tdYHsSDhuUcNz2poAe1Y4ZKXaGEoVkO0amSwWdvtO OJjwYrPsx15X+UVmapoSH10w2l6zoq42+K8y7tvVLT5ejCAB4qActIgoxUNjHD7VA7gd162K VXm0qEqpt+F3r77WvAzumNcysvulu/oHIkn+JWpWdYS5EiZLhYqpFa1F9JEa0HADnx5OkcYa VT5fnnlbdrmG6hHjDkVjEF+q3uYp1zJGbKfqE6gL3a79AM90oJjXfxx6Qk7wI9HdwGOtx5Dt //Q9VVfYF1P7ErhJ1GdZc8qOuMexvwqEH3QRSvyWqOLtB1B1v977jK3Z4S2MaGPLQ18bpaou WybLofjx95R37T

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 01, 2022 at 05:25:16PM +0200, Jan Beulich wrote: > On 01.06.2022 11:24, Roger Pau Monné wrote: > > On Wed, Jun 01, 2022 at 09:32:44AM +0200, Jan Beulich wrote: > >> On 31.05.2022 18:25, Roger Pau Monné wrote: > >>> On Fri, May 27, 2022 at 01:13:09PM +0200, Jan Beulich wrote: > >>>> @@ -566,6 +567,98 @@ struct page_info *iommu_alloc_pgtable(st > >>>> return pg; > >>>> } > >>>> > >>>> +/* > >>>> + * Intermediate page tables which get replaced by large pages may only > >>>> be > >>>> + * freed after a suitable IOTLB flush. Hence such pages get queued on a > >>>> + * per-CPU list, with a per-CPU tasklet processing the list on the > >>>> assumption > >>>> + * that the necessary IOTLB flush will have occurred by the time > >>>> tasklets get > >>>> + * to run. (List and tasklet being per-CPU has the benefit of accesses > >>>> not > >>>> + * requiring any locking.) > >>>> + */ > >>>> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list); > >>>> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet); > >>>> + > >>>> +static void free_queued_pgtables(void *arg) > >>>> +{ > >>>> + struct page_list_head *list = arg; > >>>> + struct page_info *pg; > >>>> + unsigned int done = 0; > >>>> + > >>>> + while ( (pg = page_list_remove_head(list)) ) > >>>> + { > >>>> + free_domheap_page(pg); > >>>> + > >>>> + /* Granularity of checking somewhat arbitrary. */ > >>>> + if ( !(++done & 0x1ff) ) > >>>> + process_pending_softirqs(); > >>> > >>> Hm, I'm wondering whether we really want to process pending softirqs > >>> here. > >>> > >>> Such processing will prevent the watchdog from triggering, which we > >>> likely want in production builds. OTOH in debug builds we should make > >>> sure that free_queued_pgtables() doesn't take longer than a watchdog > >>> window, or else it's likely to cause issues to guests scheduled on > >>> this same pCPU (and calling process_pending_softirqs() will just mask > >>> it). > >> > >> Doesn't this consideration apply to about every use of the function we > >> already have in the code base? > > > > Not really, at least when used by init code or by the debug key > > handlers. This use is IMO different than what I would expect, as it's > > a guest triggered path that we believe do require such processing. > > Normally we would use continuations for such long going guest > > triggered operations. > > So what do you suggest I do? Putting the call inside #ifndef CONFIG_DEBUG > is not a good option imo. Re-scheduling the tasklet wouldn't help, aiui > (it would still run again right away). Moving the work to another CPU so > this one can do other things isn't very good either - what if other CPUs > are similarly busy? Remains making things more complicated here by > involving a timer, the handler of which would re-schedule the tasklet. I > have to admit I don't like that very much either. The more that the > use of process_pending_softirqs() is "just in case" here anyway - if lots > of page tables were to be queued, I'd expect the queuing entity to be > preempted before a rather large pile could accumulate. I would be fine with adding a comment here that we don't expect the processing of softirqs to be necessary, but it's added merely as a safety guard. Long term I think we want to do with the approach of freeing such pages in guest context before resuming execution, much like how we defer vPCI BAR mappings using vpci_process_pending. But this requires some extra work, and we would also need to handle the case of the freeing being triggered in a remote domain context. > Maybe I could make iommu_queue_free_pgtable() return non-void, to instruct > the caller to bubble up a preemption notification once a certain number > of pages have been queued for freeing. This might end up intrusive ... Hm, it will end up being pretty arbitrary, as the time taken to free the pages is very variable depending on the contention on the heap lock, and hence setting a limit in number of pages will be hard. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.