[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 16/21] VT-d: free all-empty page tables


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 20 May 2022 13:13:02 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=++tL+1LepWkBGrq4ioGauRpSMuFcwSdyc//ZIqPvrss=; b=UoIywdX2wfNjSfnWK477/c9vJ/n8tUonJvQ01d0rJa59Oj2uoKIveZSEwALoOEl1dCkHP85mP2uTMCTjAMDpYxodSJhTLuzht8F6RYp2jmA/W1eJw/IrvUTOZ6CtHBYOdg8/khWuTgR2a4JMDtlPBmf2Jj8vM8SuKnreXe2Z2QyDzmqQ/WndnCHwNcA2ZR+FKOBimgJr3bSHGUML4ykRtFmXJFtOM189D+yhmvAWKDEZGs5VmCLopyZXY7DnXnKrEVXmlnl3zAGa1QqM9pY+E5VToA+GAi9DNTxZ9mR41mqBFDmx/WSy1WL4YLZkIpyG7ZNATQ6cilRYSrNFLLPk4A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Va6bvB/lnXsrRzcqOYErSBNVTrhVU4yOB9LSlIH4TDXdUEIzwXqPkW39Cr0eCgd+Qd4psWMahPYKZTrLGyynLh5Yyx+9ZaqtS2coNXRaB5DUm4191ZjO59e9n/TAPg4JPLoqPMUrhyEnb1L2AqnpwpnZonsiYRArS/g1ndlcwL00Ny4279qnLKGGRYWx8SNAcYiGxCbCeYcWPWzzBm4kPXD0Rq6/BiNZ1El/J5Vh1tpfabRYSkm8e+v07ZCXpxv8MQm9v9TpiVIi8Wc8UUMNiIQKeIg2kjU7dAqTYoHzh8Ih1YfoEi3UDKnO/xEGP9ieIYzqATgE1S+JPJxGUOVzTg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>
  • Delivery-date: Fri, 20 May 2022 11:13:27 +0000
  • Ironport-data: A9a23:6XzdqawNdWZIYmmknEt6t+c6xyrEfRIJ4+MujC+fZmUNrF6WrkUOy moYX26PaPqINGfwL4p3O4jn/BtUv8OAztA2SFRtqyAxQypGp/SeCIXCJC8cHc8zwu4v7q5Dx 59DAjUVBJlsFhcwnj/0bv656yMUOZigHtIQMsadUsxKbVIiGX5JZS5LwbZj2NY124DhWmthh PupyyHhEA79s9JLGjp8B5Kr8HuDa9yr5Vv0FnRnDRx6lAe2e0s9VfrzFonoR5fMeaFGH/bSe gr25OrRElU1XfsaIojNfr7TKiXmS1NJVOSEoiI+t6OK2nCuqsGuu0qS2TV1hUp/0l20c95NJ Nplt9+aTFckOYf3neUPejJaSwYiLZ105+qSSZS/mZT7I0zuVVLJm6krIGRoeIoS96BwHH1E8 uEeJHYVdBefiumqwbW9DO5xmsAkK8qtN4Qa0p1i5WiBUbB6HtacGOOWvLe03x9p7ixKNezZa McDLyJmcTzLYgFVO0dRA5U79AutriamKmAA+A/NzUYxy2v04DRt7IDrCsWPa4bNdJ5QhWW19 32TqgwVBTlfbrRz0wGt8Hihm+vOliPTQ58JGfuz8fsCqE2ewCkfBQMbUXO/oOKlkQiuVtRHM UsW9yEy668o+ySDVtDgWzWorXjCuQQTM/JTHvM77keRy6PSywefGmUACDVGbbQbWNQeQDUr0 hqFmo3vDDk34LmNEyrBr/GTsC+4PjUTISkafygYQAAZ4t7l5oYukhbISdUlG6mw5jHoJQzNL /mxhHBWr90uYQQjjs1XIXivb+qQm6X0
  • Ironport-hdrordr: A9a23:tiZ+6asMi2pCLKv3zTpmFLEu7skC/4Mji2hC6mlwRA09TyXGra 2TdaUgvyMc1gx7ZJhBo7+90We7MBbhHLpOkPEs1NCZLXLbUQqTXfhfBO7ZrwEIdBefygcw79 YCT0E6MqyLMbEYt7eE3ODbKadG/DDvysnB64bjJjVWPGdXgslbnntE422gYylLrWd9dPgE/M 323Ls7m9PsQwVeUiz9bUN1LNTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y90g0FWz1C7L8++S yd+jaJrJmLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjowwJi/3ggilSYx9U/mpvSwzosuo9FE2+e O86CsIDoBW0Tf8b2u1qRzi103J1ysv0WbrzRuijX7qsaXCNUUHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5vDTPisa2HackD4Hq6o+nnZfWYwRZPt6tooE5n5YF58GAWbT9J0nKu 9zF8vRjcwmPm9yV0qp/lWH/ebcHUjaRny9Mwo/U42uonRrdUlCvgolLJd1pAZEyHo/I6M0k9 gsfJ4Y0I2mdfVmHJ6VNN1xP/dfNVa9MS4kEFjiV2gPR5t3ck4klfbMkccIzdDvXqA0570Pv7 mEeG9klAcJCjfT4Iu1rdB2ziw=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 18, 2022 at 12:26:03PM +0200, Jan Beulich wrote:
> On 10.05.2022 16:30, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:42:50AM +0200, Jan Beulich wrote:
> >> @@ -837,9 +843,31 @@ static int dma_pte_clear_one(struct doma
> >>  
> >>      old = *pte;
> >>      dma_clear_pte(*pte);
> >> +    iommu_sync_cache(pte, sizeof(*pte));
> >> +
> >> +    while ( pt_update_contig_markers(&page->val,
> >> +                                     address_level_offset(addr, level),
> >> +                                     level, PTE_kind_null) &&
> >> +            ++level < min_pt_levels )
> >> +    {
> >> +        struct page_info *pg = maddr_to_page(pg_maddr);
> >> +
> >> +        unmap_vtd_domain_page(page);
> >> +
> >> +        pg_maddr = addr_to_dma_page_maddr(domain, addr, level, 
> >> flush_flags,
> >> +                                          false);
> >> +        BUG_ON(pg_maddr < PAGE_SIZE);
> >> +
> >> +        page = map_vtd_domain_page(pg_maddr);
> >> +        pte = &page[address_level_offset(addr, level)];
> >> +        dma_clear_pte(*pte);
> >> +        iommu_sync_cache(pte, sizeof(*pte));
> >> +
> >> +        *flush_flags |= IOMMU_FLUSHF_all;
> >> +        iommu_queue_free_pgtable(hd, pg);
> >> +    }
> > 
> > I think I'm setting myself for trouble, but do we need to sync cache
> > the lower lever entries if higher level ones are to be changed.
> > 
> > IOW, would it be fine to just flush the highest level modified PTE?
> > As the lower lever ones won't be reachable anyway.
> 
> I definitely want to err on the safe side here. If later we can
> prove that some cache flush is unneeded, I'd be happy to see it
> go away.

Hm, so it's not only about adding more cache flushes, but moving them
inside of the locked region: previously the only cache flush was done
outside of the locked region.

I guess I can't convince myself why we would need to flush cache of
entries that are to be removed, and that also point to pages scheduled
to be freed.

> >> @@ -2182,8 +2210,21 @@ static int __must_check cf_check intel_i
> >>      }
> >>  
> >>      *pte = new;
> >> -
> >>      iommu_sync_cache(pte, sizeof(struct dma_pte));
> >> +
> >> +    /*
> >> +     * While the (ab)use of PTE_kind_table here allows to save some work 
> >> in
> >> +     * the function, the main motivation for it is that it avoids a so far
> >> +     * unexplained hang during boot (while preparing Dom0) on a Westmere
> >> +     * based laptop.
> >> +     */
> >> +    pt_update_contig_markers(&page->val,
> >> +                             address_level_offset(dfn_to_daddr(dfn), 
> >> level),
> >> +                             level,
> >> +                             (hd->platform_ops->page_sizes &
> >> +                              (1UL << level_to_offset_bits(level + 1))
> >> +                              ? PTE_kind_leaf : PTE_kind_table));
> > 
> > So this works because on what we believe to be affected models the
> > only supported page sizes are 4K?
> 
> Yes.
> 
> > Do we want to do the same with AMD if we don't allow 512G super pages?
> 
> Why? They don't have a similar flaw.

So the question was mostly whether we should also avoid the
pt_update_contig_markers for 1G entries, because we won't coalesce
them into a 512G anyway.  IOW avoid the overhead of updating the
contig markers if we know that the resulting super-page is not
supported by ->page_sizes.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.