[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 4 May 2022 13:20:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rGLyz5rNsfkM5+0cdpWMsBi8Jw3UJRIy9rwFb2sFOmE=; b=YF1tPFbH/N2uVIysrqiElASPjM+VNBnhI3a4ZGEgv4h/fXBvAf8zVskpRtvXN/EYMOfRDsGpGc1GIiMT4VZ2TgmdWq4P86br0dmDCyuu0ALbct+SUNZ2r97163Hjmf/yY8Ibun7stAK8O/uqQUBonrN8Mn1m6AqBlzU+YjZVusMgDTLZWLnd4PBqMGEMI9lrOgdZULH7EquYD0xBx9/WJzpek4xWNyCDECGoH/R49sxbUTIUcrVq0HaaOXJkUdJfqWXRL7eapV05OTfbE80ENwx4lVqXkl8KVibIxiZxG4ZNQy3565kIWkO2f888MD54j7f5ALm/tZfmEnLHL/+xjA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eE8rHVPNBGUDeiC/ewb4jJtFiBEBkq2jXujGHQf5pSHyzydC0mb7kEiVUuWFKnT4LiPCfi3ByWi1drYD6j1CnljQV95BheYslRzV0gx79ej/mVxJjXBoweoQq3tOItz9TTJzEs0v2d68LqPOk17eqXsEoY/aklwHAwWK/pIePEuyEbZfK6SvbEUcdqRuC1TslUgdfv3IQRUqfxi5/Ox5lD0UfI8mOIdk4bxXAFROFdPDUaSU8Ph+pVFF8aictWOv1mmDVq5XZ76EN7TlSWBuJbpsbyp/tcoyDRdAAueA3lJ2GJ03KAUF0kPx6w0Uag9K+zPEjRR0OLesVSckzadZ3Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 04 May 2022 11:20:43 +0000
  • Ironport-data: A9a23:/HhddahXbFIr2LWqFa8pbM9MX161GxEKZh0ujC45NGQN5FlHY01je htvCDjSP//fZzD9LYtxYYvioBwAsJXSyIJgTARpqyA2Fi4b9cadCdqndUqhZCn6wu8v7a5EA 2fyTvGacajYm1eF/k/F3oDJ9CU6jefSLlbFILas1hpZHGeIcw98z0M68wIFqtQw24LhXlvR4 YqaT/D3YzdJ5RYlagr41IrbwP9flKyaVOQw5wFWiVhj5TcyplFNZH4tDfjZw0jQG+G4KtWSV efbpIxVy0uCl/sb5nFJpZ6gGqECaua60QFjERO6UYD66vRJjnRaPqrWqJPwwKqY4tmEt4kZ9 TlDiXC/YQ4kEbbVxtgmbxgbOT1ufqpX8rDaKmfq5KR/z2WeG5ft69NHKRlqeKE9pKNwC2wI8 uEEIjcQaBzFn/ix3L+wVuhrgIIkMdXvO4Qc/HpnyFk1D95/GcyFH/qMuoQegGdYasNmRJ4yY +IDbjVidlLYagBnMVYLEpMu2uyvgxETdhUH8A/L/vtmugA/yiRtyJvvP/7fJeW7VOxEnEHBo FLM2kvQV0Ry2Nu3jGDtHmiXru3FkD7/WYkSPKal7fMsi1qWrkQDBRtTWValrP2Rjk+lR8kZO 0ES4jApr6U56AqsVNaVdwWxvXqsrhMaHd1KHIUS9wWl2qfSpQGDCQA5oiVpbdUnsIo6QGIs3 1rRz9fxX2Qz4PuSVG6X8aqSoXWqIy8JIGQeZCgCCwwY/93kp4J1hRXKJjp+LJOIYhTOMWmY6 1i3QOIW3t3/UeZjO32HwG36
  • Ironport-hdrordr: A9a23:z72oLKkB/Q7Uf57Ks1hYEG+Ke7zpDfOrimdD5ihNYBxZY6Wkfp +V8cjzhCWftN9OYhodcIi7SdK9qTi2z+8Q3WBjB8bZYOCGghrjEGgG1+ffKlLbakrDH4JmtJ uINpIOcOEYbmIKwPoSgjPIderIqePvmM/HuQ6d9QYVcegAUdAC0+4NMHf/LqQAfnglOXNWLv qhz/sCgwDlVWUcb8y9CHVAdfPEvcf3mJXvZgNDLwI76SGV5AnYp4LSIly95FMzQjlPybAt/S zuiAri/JiutPm911v1y3LT1ZJLg9Hso+EzSvBky/JlawkEuDzYJ7iJaIfy/gzdZ9vfrWrCpe O84yvI+f4Dr085MFvF5icFkDOQrQrGo0WStWNwx0GT7fARDQhKdfaoie9iA2Tkwltls9dm3K 1R2WWF85JREBPbhSz4o8PFThdwiyOP0DMfeMMo/gtiuLElGclsRE0kjTFoOYZFGDi/5JEsEe FoAs2Z7PFKcUmCZ3ScumV02tSjUnk6Ax/DGyE5y4eo+ikTmGo8w1oTxcQZkHtF/JUhS4Nc7+ CBNqhzjrlBQsIfcKo4DuYcRsm8DHDLXHv3QSqvCEWiELtCN2PGqpbx7rlw7Oa2eIYQxJ93g5 jFWEMwjx9HR6svM7z64HRmyGG8fIzmZ0Wd9ih33ekLhoHB
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 04, 2022 at 11:46:37AM +0200, Jan Beulich wrote:
> On 03.05.2022 16:49, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote:
> >> For large page mappings to be easily usable (i.e. in particular without
> >> un-shattering of smaller page mappings) and for mapping operations to
> >> then also be more efficient, pass batches of Dom0 memory to iommu_map().
> >> In dom0_construct_pv() and its helpers (covering strict mode) this
> >> additionally requires establishing the type of those pages (albeit with
> >> zero type references).
> > 
> > I think it's possible I've already asked this.  Would it make sense to
> > add the IOMMU mappings in alloc_domheap_pages(), maybe by passing a
> > specific flag?
> 
> I don't think you did ask, but now that you do: This would look like a
> layering violation to me. I don't think allocation should ever have
> mapping (into the IOMMU or elsewhere) as a "side effect", no matter
> that ...

Hm, I'm certainly not that familiar with PV itself to likely be able
to make a proper argument here.  I fully agree with you for translated
guests using a p2m.

For PV we currently establish/teardown IOMMU mappings in
_get_page_type(), which already looks like a layering violation to me,
hence also doing so in alloc_domheap_pages() wouldn't seem that bad if
it allows to simplify the resulting code overall.

> > It would seem to me that doing it that way would also allow the
> > mappings to get established in blocks for domUs.
> 
> ... then this would perhaps be possible.
> 
> >> The installing of zero-ref writable types has in fact shown (observed
> >> while putting together the change) that despite the intention by the
> >> XSA-288 changes (affecting DomU-s only) for Dom0 a number of
> >> sufficiently ordinary pages (at the very least initrd and P2M ones as
> >> well as pages that are part of the initial allocation but not part of
> >> the initial mapping) still have been starting out as PGT_none, meaning
> >> that they would have gained IOMMU mappings only the first time these
> >> pages would get mapped writably. Consequently an open question is
> >> whether iommu_memory_setup() should set the pages to PGT_writable_page
> >> independent of need_iommu_pt_sync().
> > 
> > I think I'm confused, doesn't the setting of PGT_writable_page happen
> > as a result of need_iommu_pt_sync() and having those pages added to
> > the IOMMU page tables? (so they can be properly tracked and IOMMU
> > mappings are removed if thte page is also removed)
> 
> In principle yes - in guest_physmap_add_page(). But this function isn't
> called for the pages I did enumerate in the remark. XSA-288 really only
> cared about getting this right for DomU-s.

Would it make sense to change guest_physmap_add_page() to be able to
pass the page_order parameter down to iommu_map(), and then use it for
dom0 build instead of introducing iommu_memory_setup()?

I think guest_physmap_add_page() will need to be adjusted at some
point for domUs, and hence it could be unified with dom0 usage
also?

> > If the pages are not added here (because dom0 is not running in strict
> > mode) then setting PGT_writable_page is not required?
> 
> Correct - in that case we skip fiddling with IOMMU mappings on
> transitions to/from PGT_writable_page, and hence putting this type in
> place would be benign (but improve consistency).
> 
> >> Note also that strictly speaking the iommu_iotlb_flush_all() here (as
> >> well as the pre-existing one in arch_iommu_hwdom_init()) shouldn't be
> >> needed: Actual hooking up (AMD) or enabling of translation (VT-d)
> >> occurs only afterwards anyway, so nothing can have made it into TLBs
> >> just yet.
> > 
> > Hm, indeed. I think the one in arch_iommu_hwdom_init can surely go
> > away, as we must strictly do the hwdom init before enabling the iommu
> > itself.
> 
> Why would that be? That's imo as much of an implementation detail as
> ...

Well, you want to have the reserved/inclusive options applied (and
mappings created) before enabling the IOMMU, because at that point
devices have already been assigned.  So it depends more on a
combination of devices assigned & IOMMU enabled rather than just IOMMU
being enabled.

> > The one in dom0 build I'm less convinced, just to be on the safe side
> > if we ever change the order of IOMMU init and memory setup.
> 
> ... this. Just like we populate tables with the IOMMU already enabled
> for DomU-s, I think the same would be valid to do for Dom0.
> 
> > I would expect flushing an empty TLB to not be very expensive?
> 
> I wouldn't "expect" this - it might be this way, but it surely depends
> on whether an implementation can easily tell whether the TLB is empty,
> and whether its emptiness actually makes a difference for the latency
> of a flush operation.
> 
> >> --- a/xen/drivers/passthrough/x86/iommu.c
> >> +++ b/xen/drivers/passthrough/x86/iommu.c
> >> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i
> >>  
> >>  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >>  {
> >> -    unsigned long i, top, max_pfn;
> >> -    unsigned int flush_flags = 0;
> >> +    unsigned long i, top, max_pfn, start, count;
> >> +    unsigned int flush_flags = 0, start_perms = 0;
> >>  
> >>      BUG_ON(!is_hardware_domain(d));
> >>  
> >> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>       * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
> >>       * setting up potentially conflicting mappings here.
> >>       */
> >> -    i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >> +    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>  
> >> -    for ( ; i < top; i++ )
> >> +    for ( i = start, count = 0; i < top; )
> >>      {
> >>          unsigned long pfn = pdx_to_pfn(i);
> >>          unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
> >> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>          if ( !perms )
> >>              rc = 0;
> >>          else if ( paging_mode_translate(d) )
> >> +        {
> >>              rc = p2m_add_identity_entry(d, pfn,
> >>                                          perms & IOMMUF_writable ? 
> >> p2m_access_rw
> >>                                                                  : 
> >> p2m_access_r,
> >>                                          0);
> >> +            if ( rc )
> >> +                printk(XENLOG_WARNING
> >> +                       "%pd: identity mapping of %lx failed: %d\n",
> >> +                       d, pfn, rc);
> >> +        }
> >> +        else if ( pfn != start + count || perms != start_perms )
> >> +        {
> >> +        commit:
> >> +            rc = iommu_map(d, _dfn(start), _mfn(start), count, 
> >> start_perms,
> >> +                           &flush_flags);
> >> +            if ( rc )
> >> +                printk(XENLOG_WARNING
> >> +                       "%pd: IOMMU identity mapping of [%lx,%lx) failed: 
> >> %d\n",
> >> +                       d, pfn, pfn + count, rc);
> >> +            SWAP(start, pfn);
> >> +            start_perms = perms;
> >> +            count = 1;
> >> +        }
> >>          else
> >> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
> >> -                           perms, &flush_flags);
> >> +        {
> >> +            ++count;
> >> +            rc = 0;
> > 
> > Seeing as we want to process this in blocks now, I wonder whether it
> > would make sense to take a different approach, and use a rangeset to
> > track which regions need to be mapped.  What gets added would be based
> > on the host e820 plus the options
> > iommu_hwdom_{strict,inclusive,reserved}.  We would then punch holes
> > based on the logic in hwdom_iommu_map() and finally we could iterate
> > over the regions afterwards using rangeset_consume_ranges().
> > 
> > Not that you strictly need to do it here, just think the end result
> > would be clearer.
> 
> The end result might indeed be, but it would be more of a change right
> here. Hence I'd prefer to leave that out of the series for now.

OK.  I think it might be nice to add a comment in that regard, mostly
because I tend to forget myself.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.