Xen project Mailing List

RE: [PATCH 2/6] x86/iommu: add common page-table allocator

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: "Durrant, Paul" <pdurrant@xxxxxxxxxxxx>

Date: Mon, 27 Jul 2020 09:37:47 +0000

Accept-language: en-GB, en-US

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 27 Jul 2020 09:38:44 +0000

Ironport-sdr: MvjbV/Tmpm3PDWtolt+8BfAVux5KR42P4ZrIqoMS7HFkh79mkWaFSF7B1fkgUxf2TDOK5jW7xB n2uBlm/Eb8+w==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQJEvXWV1fglpFS8p501Sb8VALJosQEN3iu2Al/FU/SoIwmk8A==

Thread-topic: [PATCH 2/6] x86/iommu: add common page-table allocator

> -----Original Message----- > From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Sent: 24 July 2020 19:24 > To: Paul Durrant <paul@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: Durrant, Paul <pdurrant@xxxxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; > Kevin Tian > <kevin.tian@xxxxxxxxx>; Wei Liu <wl@xxxxxxx>; Roger Pau Monné > <roger.pau@xxxxxxxxxx> > Subject: RE: [EXTERNAL] [PATCH 2/6] x86/iommu: add common page-table allocator > > CAUTION: This email originated from outside of the organization. Do not click > links or open > attachments unless you can confirm the sender and know the content is safe. > > > > On 24/07/2020 17:46, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@xxxxxxxxxx> > > > > Instead of having separate page table allocation functions in VT-d and AMD > > IOMMU code, use a common allocation function in the general x86 code. > > Also, rather than walking the page tables and using a tasklet to free them > > during iommu_domain_destroy(), add allocated page table pages to a list and > > then simply walk the list to free them. This saves ~90 lines of code > > overall. > > > > NOTE: There is no need to clear and sync PTEs during teardown since the per- > > device root entries will have already been cleared (when devices were > > de-assigned) so the page tables can no longer be accessed by the > > IOMMU. > > > > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx> > > Oh wow - I don't have any polite words for how that code was organised > before. > > Instead of discussing the ~90 lines saved, what about the removal of a > global bottleneck (the tasklet) or expand on the removal of redundant > TLB/cache maintenance? > Ok. > > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c > b/xen/drivers/passthrough/amd/pci_amd_iommu.c > > index c386dc4387..fd9b1e7bd5 100644 > > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, > > u8 devfn, > > return reassign_device(pdev->domain, d, devfn, pdev); > > } > > > > -static void deallocate_next_page_table(struct page_info *pg, int level) > > -{ > > - PFN_ORDER(pg) = level; > > - spin_lock(&iommu_pt_cleanup_lock); > > - page_list_add_tail(pg, &iommu_pt_cleanup_list); > > - spin_unlock(&iommu_pt_cleanup_lock); > > -} > > - > > -static void deallocate_page_table(struct page_info *pg) > > -{ > > - struct amd_iommu_pte *table_vaddr; > > - unsigned int index, level = PFN_ORDER(pg); > > - > > - PFN_ORDER(pg) = 0; > > - > > - if ( level <= 1 ) > > - { > > - free_amd_iommu_pgtable(pg); > > - return; > > - } > > - > > - table_vaddr = __map_domain_page(pg); > > - > > - for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ ) > > - { > > - struct amd_iommu_pte *pde = &table_vaddr[index]; > > - > > - if ( pde->mfn && pde->next_level && pde->pr ) > > - { > > - /* We do not support skip levels yet */ > > - ASSERT(pde->next_level == level - 1); > > - deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)), > > - pde->next_level); > > - } > > - } > > - > > - unmap_domain_page(table_vaddr); > > - free_amd_iommu_pgtable(pg); > > -} > > - > > -static void deallocate_iommu_page_tables(struct domain *d) > > -{ > > - struct domain_iommu *hd = dom_iommu(d); > > - > > - spin_lock(&hd->arch.mapping_lock); > > - if ( hd->arch.amd_iommu.root_table ) > > - { > > - deallocate_next_page_table(hd->arch.amd_iommu.root_table, > > - hd->arch.amd_iommu.paging_mode); > > I really need to dust off my patch fixing up several bits of dubious > logic, including the name "paging_mode" which is actually simply the > number of levels. > > At this point, it will probably be best to get this series in first, and > for me to rebase. > Ok. > > - hd->arch.amd_iommu.root_table = NULL; > > - } > > - spin_unlock(&hd->arch.mapping_lock); > > -} > > - > > - > > static void amd_iommu_domain_destroy(struct domain *d) > > { > > - deallocate_iommu_page_tables(d); > > + dom_iommu(d)->arch.amd_iommu.root_table = NULL; > > amd_iommu_flush_all_pages(d); > > Per your NOTE:, shouldn't this flush call be dropped? > Indeed it should. > > diff --git a/xen/drivers/passthrough/x86/iommu.c > > b/xen/drivers/passthrough/x86/iommu.c > > index a12109a1de..b3c7da0fe2 100644 > > --- a/xen/drivers/passthrough/x86/iommu.c > > +++ b/xen/drivers/passthrough/x86/iommu.c > > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d) > > > > spin_lock_init(&hd->arch.mapping_lock); > > > > + INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list); > > + spin_lock_init(&hd->arch.pgtables.lock); > > + > > return 0; > > } > > > > void arch_iommu_domain_destroy(struct domain *d) > > { > > + struct domain_iommu *hd = dom_iommu(d); > > + struct page_info *pg; > > + > > + while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) ) > > + free_domheap_page(pg); > > Some of those 90 lines saved were the logic to not suffer a watchdog > timeout here. > > To do it like this, it needs plumbing into the relinquish resources path. > Ok. I does look like there could be other potentially lengthy destruction done off the back of the RCU call. Ought we have the ability to have a restartable domain_destroy()? > > } > > > > static bool __hwdom_init hwdom_iommu_map(const struct domain *d, > > @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain > > *d) > > return; > > } > > > > +struct page_info *iommu_alloc_pgtable(struct domain *d) > > +{ > > + struct domain_iommu *hd = dom_iommu(d); > > +#ifdef CONFIG_NUMA > > + unsigned int memflags = (hd->node == NUMA_NO_NODE) ? > > + 0 : MEMF_node(hd->node); > > +#else > > + unsigned int memflags = 0; > > +#endif > > + struct page_info *pg; > > + void *p; > > The memflags code is very awkward. How about initialising it to 0, and > having: > > #ifdef CONFIG_NUMA > if ( hd->node != NUMA_NO_NODE ) > memflags = MEMF_node(hd->node); > #endif > > here? > Sure. > > + > > + BUG_ON(!iommu_enabled); > > Is this really necessary? Can we plausibly end up in this function > otherwise? > Not really; I'll drop it. > > Overall, I wonder if this patch would better be split into several. One > which introduces the common alloc/free implementation, two which switch > the VT-d and AMD implementations over, and possibly one clean-up on the end? > Ok, if you feel the patch is too large as-is then I'll split as you suggest. Paul > ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.