[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 2/6] x86/iommu: add common page-table allocator



> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Sent: 24 July 2020 19:24
> To: Paul Durrant <paul@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Durrant, Paul <pdurrant@xxxxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; 
> Kevin Tian
> <kevin.tian@xxxxxxxxx>; Wei Liu <wl@xxxxxxx>; Roger Pau Monné 
> <roger.pau@xxxxxxxxxx>
> Subject: RE: [EXTERNAL] [PATCH 2/6] x86/iommu: add common page-table allocator
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@xxxxxxxxxx>
> >
> > Instead of having separate page table allocation functions in VT-d and AMD
> > IOMMU code, use a common allocation function in the general x86 code.
> > Also, rather than walking the page tables and using a tasklet to free them
> > during iommu_domain_destroy(), add allocated page table pages to a list and
> > then simply walk the list to free them. This saves ~90 lines of code 
> > overall.
> >
> > NOTE: There is no need to clear and sync PTEs during teardown since the per-
> >       device root entries will have already been cleared (when devices were
> >       de-assigned) so the page tables can no longer be accessed by the 
> > IOMMU.
> >
> > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
> 
> Oh wow - I don't have any polite words for how that code was organised
> before.
> 
> Instead of discussing the ~90 lines saved, what about the removal of a
> global bottleneck (the tasklet) or expand on the removal of redundant
> TLB/cache maintenance?
> 

Ok.

> > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > index c386dc4387..fd9b1e7bd5 100644
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, 
> > u8 devfn,
> >      return reassign_device(pdev->domain, d, devfn, pdev);
> >  }
> >
> > -static void deallocate_next_page_table(struct page_info *pg, int level)
> > -{
> > -    PFN_ORDER(pg) = level;
> > -    spin_lock(&iommu_pt_cleanup_lock);
> > -    page_list_add_tail(pg, &iommu_pt_cleanup_list);
> > -    spin_unlock(&iommu_pt_cleanup_lock);
> > -}
> > -
> > -static void deallocate_page_table(struct page_info *pg)
> > -{
> > -    struct amd_iommu_pte *table_vaddr;
> > -    unsigned int index, level = PFN_ORDER(pg);
> > -
> > -    PFN_ORDER(pg) = 0;
> > -
> > -    if ( level <= 1 )
> > -    {
> > -        free_amd_iommu_pgtable(pg);
> > -        return;
> > -    }
> > -
> > -    table_vaddr = __map_domain_page(pg);
> > -
> > -    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> > -    {
> > -        struct amd_iommu_pte *pde = &table_vaddr[index];
> > -
> > -        if ( pde->mfn && pde->next_level && pde->pr )
> > -        {
> > -            /* We do not support skip levels yet */
> > -            ASSERT(pde->next_level == level - 1);
> > -            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> > -                                       pde->next_level);
> > -        }
> > -    }
> > -
> > -    unmap_domain_page(table_vaddr);
> > -    free_amd_iommu_pgtable(pg);
> > -}
> > -
> > -static void deallocate_iommu_page_tables(struct domain *d)
> > -{
> > -    struct domain_iommu *hd = dom_iommu(d);
> > -
> > -    spin_lock(&hd->arch.mapping_lock);
> > -    if ( hd->arch.amd_iommu.root_table )
> > -    {
> > -        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> > -                                   hd->arch.amd_iommu.paging_mode);
> 
> I really need to dust off my patch fixing up several bits of dubious
> logic, including the name "paging_mode" which is actually simply the
> number of levels.
> 
> At this point, it will probably be best to get this series in first, and
> for me to rebase.
> 

Ok.

> > -        hd->arch.amd_iommu.root_table = NULL;
> > -    }
> > -    spin_unlock(&hd->arch.mapping_lock);
> > -}
> > -
> > -
> >  static void amd_iommu_domain_destroy(struct domain *d)
> >  {
> > -    deallocate_iommu_page_tables(d);
> > +    dom_iommu(d)->arch.amd_iommu.root_table = NULL;
> >      amd_iommu_flush_all_pages(d);
> 
> Per your NOTE:, shouldn't this flush call be dropped?
> 

Indeed it should.

> > diff --git a/xen/drivers/passthrough/x86/iommu.c 
> > b/xen/drivers/passthrough/x86/iommu.c
> > index a12109a1de..b3c7da0fe2 100644
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
> >
> >      spin_lock_init(&hd->arch.mapping_lock);
> >
> > +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
> > +    spin_lock_init(&hd->arch.pgtables.lock);
> > +
> >      return 0;
> >  }
> >
> >  void arch_iommu_domain_destroy(struct domain *d)
> >  {
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +    struct page_info *pg;
> > +
> > +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> > +        free_domheap_page(pg);
> 
> Some of those 90 lines saved were the logic to not suffer a watchdog
> timeout here.
> 
> To do it like this, it needs plumbing into the relinquish resources path.
> 

Ok. I does look like there could be other potentially lengthy destruction done 
off the back of the RCU call. Ought we have the ability to have a restartable 
domain_destroy()?

> >  }
> >
> >  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> > @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain 
> > *d)
> >          return;
> >  }
> >
> > +struct page_info *iommu_alloc_pgtable(struct domain *d)
> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +#ifdef CONFIG_NUMA
> > +    unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
> > +        0 : MEMF_node(hd->node);
> > +#else
> > +    unsigned int memflags = 0;
> > +#endif
> > +    struct page_info *pg;
> > +    void *p;
> 
> The memflags code is very awkward.  How about initialising it to 0, and
> having:
> 
> #ifdef CONFIG_NUMA
>     if ( hd->node != NUMA_NO_NODE )
>         memflags = MEMF_node(hd->node);
> #endif
> 
> here?
> 

Sure.

> > +
> > +    BUG_ON(!iommu_enabled);
> 
> Is this really necessary?  Can we plausibly end up in this function
> otherwise?
> 

Not really; I'll drop it.

> 
> Overall, I wonder if this patch would better be split into several.  One
> which introduces the common alloc/free implementation, two which switch
> the VT-d and AMD implementations over, and possibly one clean-up on the end?
> 

Ok, if you feel the patch is too large as-is then I'll split as you suggest.

  Paul

> ~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.