[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/6] x86/iommu: add common page-table allocator


  • To: Paul Durrant <paul@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 24 Jul 2020 19:24:20 +0100
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Wei Liu <wl@xxxxxxx>, Paul Durrant <pdurrant@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 24 Jul 2020 18:24:28 +0000
  • Ironport-sdr: RhSL3m/N3qMC3bIZsCvUEY6lHZXgRDQakIO1tb7xbGIHV0S5CNsM+v3aaaKRV71KCYR4xRjP5G xkW0EaWkFbFLfxQ5QuWr0vab2WncTd1XSR8OOdeQE/OetNuH5tIpa/KvjaaUs4SR0qP9Ldqcar TiBpcaxQiOneytMAORLP/bYUPIu9RSYxsSzHpyBMJ59EBib0MOxeHfasrijcfQgKN4Eed7r1AM mbE9zIDR8yPOUYstvoop3IKW1keN3peiLfHhp9YdIfgBK6t7CcE/w5ktu3+EU21FZS+U8l1Fqn 8Zk=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24/07/2020 17:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@xxxxxxxxxx>
>
> Instead of having separate page table allocation functions in VT-d and AMD
> IOMMU code, use a common allocation function in the general x86 code.
> Also, rather than walking the page tables and using a tasklet to free them
> during iommu_domain_destroy(), add allocated page table pages to a list and
> then simply walk the list to free them. This saves ~90 lines of code overall.
>
> NOTE: There is no need to clear and sync PTEs during teardown since the per-
>       device root entries will have already been cleared (when devices were
>       de-assigned) so the page tables can no longer be accessed by the IOMMU.
>
> Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>

Oh wow - I don't have any polite words for how that code was organised
before.

Instead of discussing the ~90 lines saved, what about the removal of a
global bottleneck (the tasklet) or expand on the removal of redundant
TLB/cache maintenance?

> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index c386dc4387..fd9b1e7bd5 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 
> devfn,
>      return reassign_device(pdev->domain, d, devfn, pdev);
>  }
>  
> -static void deallocate_next_page_table(struct page_info *pg, int level)
> -{
> -    PFN_ORDER(pg) = level;
> -    spin_lock(&iommu_pt_cleanup_lock);
> -    page_list_add_tail(pg, &iommu_pt_cleanup_list);
> -    spin_unlock(&iommu_pt_cleanup_lock);
> -}
> -
> -static void deallocate_page_table(struct page_info *pg)
> -{
> -    struct amd_iommu_pte *table_vaddr;
> -    unsigned int index, level = PFN_ORDER(pg);
> -
> -    PFN_ORDER(pg) = 0;
> -
> -    if ( level <= 1 )
> -    {
> -        free_amd_iommu_pgtable(pg);
> -        return;
> -    }
> -
> -    table_vaddr = __map_domain_page(pg);
> -
> -    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> -    {
> -        struct amd_iommu_pte *pde = &table_vaddr[index];
> -
> -        if ( pde->mfn && pde->next_level && pde->pr )
> -        {
> -            /* We do not support skip levels yet */
> -            ASSERT(pde->next_level == level - 1);
> -            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> -                                       pde->next_level);
> -        }
> -    }
> -
> -    unmap_domain_page(table_vaddr);
> -    free_amd_iommu_pgtable(pg);
> -}
> -
> -static void deallocate_iommu_page_tables(struct domain *d)
> -{
> -    struct domain_iommu *hd = dom_iommu(d);
> -
> -    spin_lock(&hd->arch.mapping_lock);
> -    if ( hd->arch.amd_iommu.root_table )
> -    {
> -        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> -                                   hd->arch.amd_iommu.paging_mode);

I really need to dust off my patch fixing up several bits of dubious
logic, including the name "paging_mode" which is actually simply the
number of levels.

At this point, it will probably be best to get this series in first, and
for me to rebase.

> -        hd->arch.amd_iommu.root_table = NULL;
> -    }
> -    spin_unlock(&hd->arch.mapping_lock);
> -}
> -
> -
>  static void amd_iommu_domain_destroy(struct domain *d)
>  {
> -    deallocate_iommu_page_tables(d);
> +    dom_iommu(d)->arch.amd_iommu.root_table = NULL;
>      amd_iommu_flush_all_pages(d);

Per your NOTE:, shouldn't this flush call be dropped?

> diff --git a/xen/drivers/passthrough/x86/iommu.c 
> b/xen/drivers/passthrough/x86/iommu.c
> index a12109a1de..b3c7da0fe2 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>  
>      spin_lock_init(&hd->arch.mapping_lock);
>  
> +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
> +    spin_lock_init(&hd->arch.pgtables.lock);
> +
>      return 0;
>  }
>  
>  void arch_iommu_domain_destroy(struct domain *d)
>  {
> +    struct domain_iommu *hd = dom_iommu(d);
> +    struct page_info *pg;
> +
> +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> +        free_domheap_page(pg);

Some of those 90 lines saved were the logic to not suffer a watchdog
timeout here.

To do it like this, it needs plumbing into the relinquish resources path.

>  }
>  
>  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>          return;
>  }
>  
> +struct page_info *iommu_alloc_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +#ifdef CONFIG_NUMA
> +    unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
> +        0 : MEMF_node(hd->node);
> +#else
> +    unsigned int memflags = 0;
> +#endif
> +    struct page_info *pg;
> +    void *p;

The memflags code is very awkward.  How about initialising it to 0, and
having:

#ifdef CONFIG_NUMA
    if ( hd->node != NUMA_NO_NODE )
        memflags = MEMF_node(hd->node);
#endif

here?

> +
> +    BUG_ON(!iommu_enabled);

Is this really necessary?  Can we plausibly end up in this function
otherwise?


Overall, I wonder if this patch would better be split into several.  One
which introduces the common alloc/free implementation, two which switch
the VT-d and AMD implementations over, and possibly one clean-up on the end?

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.