[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 03/15] IOMMU/x86: support freeing of pagetables
On Wed, Jun 01, 2022 at 05:25:16PM +0200, Jan Beulich wrote: > On 01.06.2022 11:24, Roger Pau Monné wrote: > > On Wed, Jun 01, 2022 at 09:32:44AM +0200, Jan Beulich wrote: > >> On 31.05.2022 18:25, Roger Pau Monné wrote: > >>> On Fri, May 27, 2022 at 01:13:09PM +0200, Jan Beulich wrote: > >>>> @@ -566,6 +567,98 @@ struct page_info *iommu_alloc_pgtable(st > >>>> return pg; > >>>> } > >>>> > >>>> +/* > >>>> + * Intermediate page tables which get replaced by large pages may only > >>>> be > >>>> + * freed after a suitable IOTLB flush. Hence such pages get queued on a > >>>> + * per-CPU list, with a per-CPU tasklet processing the list on the > >>>> assumption > >>>> + * that the necessary IOTLB flush will have occurred by the time > >>>> tasklets get > >>>> + * to run. (List and tasklet being per-CPU has the benefit of accesses > >>>> not > >>>> + * requiring any locking.) > >>>> + */ > >>>> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list); > >>>> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet); > >>>> + > >>>> +static void free_queued_pgtables(void *arg) > >>>> +{ > >>>> + struct page_list_head *list = arg; > >>>> + struct page_info *pg; > >>>> + unsigned int done = 0; > >>>> + > >>>> + while ( (pg = page_list_remove_head(list)) ) > >>>> + { > >>>> + free_domheap_page(pg); > >>>> + > >>>> + /* Granularity of checking somewhat arbitrary. */ > >>>> + if ( !(++done & 0x1ff) ) > >>>> + process_pending_softirqs(); > >>> > >>> Hm, I'm wondering whether we really want to process pending softirqs > >>> here. > >>> > >>> Such processing will prevent the watchdog from triggering, which we > >>> likely want in production builds. OTOH in debug builds we should make > >>> sure that free_queued_pgtables() doesn't take longer than a watchdog > >>> window, or else it's likely to cause issues to guests scheduled on > >>> this same pCPU (and calling process_pending_softirqs() will just mask > >>> it). > >> > >> Doesn't this consideration apply to about every use of the function we > >> already have in the code base? > > > > Not really, at least when used by init code or by the debug key > > handlers. This use is IMO different than what I would expect, as it's > > a guest triggered path that we believe do require such processing. > > Normally we would use continuations for such long going guest > > triggered operations. > > So what do you suggest I do? Putting the call inside #ifndef CONFIG_DEBUG > is not a good option imo. Re-scheduling the tasklet wouldn't help, aiui > (it would still run again right away). Moving the work to another CPU so > this one can do other things isn't very good either - what if other CPUs > are similarly busy? Remains making things more complicated here by > involving a timer, the handler of which would re-schedule the tasklet. I > have to admit I don't like that very much either. The more that the > use of process_pending_softirqs() is "just in case" here anyway - if lots > of page tables were to be queued, I'd expect the queuing entity to be > preempted before a rather large pile could accumulate. I would be fine with adding a comment here that we don't expect the processing of softirqs to be necessary, but it's added merely as a safety guard. Long term I think we want to do with the approach of freeing such pages in guest context before resuming execution, much like how we defer vPCI BAR mappings using vpci_process_pending. But this requires some extra work, and we would also need to handle the case of the freeing being triggered in a remote domain context. > Maybe I could make iommu_queue_free_pgtable() return non-void, to instruct > the caller to bubble up a preemption notification once a certain number > of pages have been queued for freeing. This might end up intrusive ... Hm, it will end up being pretty arbitrary, as the time taken to free the pages is very variable depending on the contention on the heap lock, and hence setting a limit in number of pages will be hard. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |