[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 15/21] AMD/IOMMU: free all-empty page tables


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 10 May 2022 15:30:23 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7l0EaIzwaBZrDZkYVGaiDs6kTZCqmXmKBNv7zBHnkYY=; b=Qr6R61AbHzTPJefd4byWr2CARyv6xEzCxdwD6BgpH8OQspIrWrQ4AKMJC9HrHfcoNC6+NCXmkdQ2gn7jzHx+1uU+5WJbMU83U+dDmVmiA74X4E+kWZR1lfMu7kAGoB0D3ZqVL/9sedrE2djy5n7xnepl7pqAeLt+93kYgHmqBTtHv8s2SN8jF9cKB0xj0wAaGFi6JpkcOLfZymPPwEB3pIqeICHPGr5siU1CmOtXEPYqAoHna84HK+TbgqEkKXnkFjshbvHb6hZiMBbcMCjXrkz9TfuzcieMMVgEgCIQSHSufG3UhykmH31QoTBsacF1upArkrhmRIiA83qvJbnOAQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aWSlrshCqepRHPAGDj4sS5/foBjZDSKk6NQHSv/WjYr2Km9hC8/L5+IPJbvEAZEkveM4wpagtEkdcDcNzLNtYAVXIFoN10gXGYL2d43QB7aisHFkz9TAEWqFBymfu6xl62q4NSJGEjVMi0jTzmk3NdnCEao4Wtgw3HMNvxDygKzQp2h5n69RjG9+CdiBwnATdgJy+iIQ76CBUkm8IabkKT0fsjryLeswJnu+HJ51ONF8bCWif7TsGpt/5oqN1W9klLqeNcgoP6i/gsFRfoIQ2vORY4GJwz6aDmvzjdRrZcmsCHZ0XEH1KOO/bvfu4Oxk3yrQoHS+l7G8E2wcv0N/SA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Tue, 10 May 2022 13:30:43 +0000
  • Ironport-data: A9a23:vroLtqmFDZVszNyeyM2lbYfo5gz3J0RdPkR7XQ2eYbSJt1+Wr1Gzt xIXD2GOaa3famH1edxzO4uxpkgB7ZHTztFrG1E/qnsxESMWpZLJC+rCIxarNUt+DCFioGGLT Sk6QoOdRCzhZiaE/n9BCpC48T8kk/vgqoPUUIYoAAgoLeNfYHpn2EsLd9IR2NYy24DkWlvV4 7senuWEULOb828sWo4rw/rrRCNH5JwebxtB4zTSzdgS1LPvvyF94KA3fMldHFOhKmVgJcaoR v6r8V2M1jixEyHBqD+Suu2TnkUiGtY+NOUV45Zcc/DKbhNq/kTe3kunXRa1hIg+ZzihxrhMJ NtxWZOYaVgmZqiUkfsnY0cDS3wvIpZrv7udGC3q2SCT5xWun3rE5dxLVRlzF6tHv+F9DCdJ6 OASLy0LYlabneWqzbmnS+5qwMM+MM3sO4BZsXZlpd3bJa9+HdafHOOVvZkBhGdYasNmRJ4yY +IDbjVidlLYagBnMVYLEpMu2uyvgxETdhUH8APL9PBsvAA/yiRh1pjgH+vLS+eXWMZfvVyVj Hvj+0/2V0Ry2Nu3jGDtHmiXru3FkD7/WYkSPKal7fMsi1qWrkQDBRtTWValrP2Rjk+lR8kZO 0ES4jApr6U56AqsVNaVdwWxvXqsrhMaHd1KHIUHBBqlz6PV50OTADcCRzsYMNg+7pZuFXoty 0ODmM7vCXp3qrqJRHmB97CS6zSvJSwSKmxEbigBJecY3+TeTEgIpkqnZr5e/GSd1bUZxRmYL +i2kRUD
  • Ironport-hdrordr: A9a23:05hAmqPG3HcT2MBcT1P155DYdb4zR+YMi2TDiHoddfUFSKalfp 6V98jztSWatN/eYgBEpTmlAtj5fZq6z+8P3WBxB8baYOCCggeVxe5ZjbcKrweQeBEWs9Qtr5 uIEJIOd+EYb2IK6voSiTPQe7hA/DDEytHPuQ639QYQcegAUdAF0+4WMHf4LqUgLzM2eKbRWa Dsr/Zvln6FQzA6f867Dn4KU6zqoMDKrovvZVojCwQ84AeDoDu04PqieiLolSs2Yndq+/MP4G LFmwv26uGKtOy68AbV0yv2445NkNXs59NfDIini9QTKB/rlgG0Db4RE4GqjXQQmqWC+VwqmN 7Dr1MJONly0WrYeiWPrR7ky2DboUITwk6n7WXdrWrooMT/Sj5/IdFGn5hlfhzQ7FdllM1g0Y pQtljp+KZ/PFflpmDQ9tLIXxZlmg6funw5i9MeiHRZTM83dKJRl4oC50lYea1wUB4S0LpXUd WGMfuspMq/KTihHjPkVyhUsZGRt00Ib1m7qhNogL3W79BU9EoJunfwivZv20voz6hNOqWs19 60TJiAq4s+PvP+TZgNc9vpEvHHfFAkf3r3QRGvCGWiMp07EFTwjLOyyIkJxYiRCe41Jd0J6d 78bG8=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 25, 2022 at 10:42:19AM +0200, Jan Beulich wrote:
> When a page table ends up with no present entries left, it can be
> replaced by a non-present entry at the next higher level. The page table
> itself can then be scheduled for freeing.
> 
> Note that while its output isn't used there yet,
> pt_update_contig_markers() right away needs to be called in all places
> where entries get updated, not just the one where entries get cleared.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Some comments below.

> ---
> v4: Re-base over changes earlier in the series.
> v3: Re-base over changes earlier in the series.
> v2: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -21,6 +21,9 @@
>  
>  #include "iommu.h"
>  
> +#define CONTIG_MASK IOMMU_PTE_CONTIG_MASK
> +#include <asm/pt-contig-markers.h>
> +
>  /* Given pfn and page table level, return pde index */
>  static unsigned int pfn_to_pde_idx(unsigned long pfn, unsigned int level)
>  {
> @@ -33,16 +36,20 @@ static unsigned int pfn_to_pde_idx(unsig
>  
>  static union amd_iommu_pte clear_iommu_pte_present(unsigned long l1_mfn,
>                                                     unsigned long dfn,
> -                                                   unsigned int level)
> +                                                   unsigned int level,
> +                                                   bool *free)
>  {
>      union amd_iommu_pte *table, *pte, old;
> +    unsigned int idx = pfn_to_pde_idx(dfn, level);
>  
>      table = map_domain_page(_mfn(l1_mfn));
> -    pte = &table[pfn_to_pde_idx(dfn, level)];
> +    pte = &table[idx];
>      old = *pte;
>  
>      write_atomic(&pte->raw, 0);
>  
> +    *free = pt_update_contig_markers(&table->raw, idx, level, PTE_kind_null);
> +
>      unmap_domain_page(table);
>  
>      return old;
> @@ -85,7 +92,11 @@ static union amd_iommu_pte set_iommu_pte
>      if ( !old.pr || old.next_level ||
>           old.mfn != next_mfn ||
>           old.iw != iw || old.ir != ir )
> +    {
>          set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> +        pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
> +                                 level, PTE_kind_leaf);

It would be better to call pt_update_contig_markers inside of
set_iommu_pde_present, but that would imply changing the parameters
passed to the function.  It's cumbersome (and error prone) to have to
pair calls to set_iommu_pde_present() with pt_update_contig_markers().

> +    }
>      else
>          old.pr = false; /* signal "no change" to the caller */
>  
> @@ -322,6 +333,9 @@ static int iommu_pde_from_dfn(struct dom
>              smp_wmb();
>              set_iommu_pde_present(pde, next_table_mfn, next_level, true,
>                                    true);
> +            pt_update_contig_markers(&next_table_vaddr->raw,
> +                                     pfn_to_pde_idx(dfn, level),
> +                                     level, PTE_kind_table);
>  
>              *flush_flags |= IOMMU_FLUSHF_modified;
>          }
> @@ -347,6 +361,9 @@ static int iommu_pde_from_dfn(struct dom
>                  next_table_mfn = mfn_x(page_to_mfn(table));
>                  set_iommu_pde_present(pde, next_table_mfn, next_level, true,
>                                        true);
> +                pt_update_contig_markers(&next_table_vaddr->raw,
> +                                         pfn_to_pde_idx(dfn, level),
> +                                         level, PTE_kind_table);
>              }
>              else /* should never reach here */
>              {
> @@ -474,8 +491,24 @@ int cf_check amd_iommu_unmap_page(
>  
>      if ( pt_mfn )
>      {
> +        bool free;
> +
>          /* Mark PTE as 'page not present'. */
> -        old = clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level);
> +        old = clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free);
> +
> +        while ( unlikely(free) && ++level < hd->arch.amd.paging_mode )
> +        {
> +            struct page_info *pg = mfn_to_page(_mfn(pt_mfn));
> +
> +            if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn,
> +                                    flush_flags, false) )
> +                BUG();
> +            BUG_ON(!pt_mfn);
> +
> +            clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free);

Not sure it's worth initializing free to false (at definition and
before each call to clear_iommu_pte_present), just in case we manage
to return early from clear_iommu_pte_present without having updated
'free'.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.