[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 4 May 2022 15:55:45 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nQ1GIAIcILvu1VTg7Kv7YpbGZ4BPwhFXFVKbWwZ7Haw=; b=WAHEcESXe6ASaNpZGvkYY5QrvvQ5vQwk8e0BXlFI7grDRAFebhO25fH9Fr+7PlXzE26koTurWm3FF/of/BsOo12tGT0502ijRVkhap4Tz3KTK8qiZy5XIMpnnLDeiiPR13BH9MaTu2bI8AiBo8eI+T/UHCshPJb+4U0I9s2DK/G2amaNibcrFU1dITvnn6+8sTpQ0mEmLcC61UEtyLIw7JvY8mqAyA/9Qm6OHFl5HIMMs7cRALI7OdPAkliKpiAXdNirz2GV0hszgXJTk7kRFHDMIOMl/JY+9gAfs7IeDGyRm40DVj8goWHDL6M5AKpaZXodzoMRzVBJB8+uDoWYTg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PfQQbnww9kv/59hfsb+3CuC37PXX//RiGCrKisfPhiR7mGWmAGjawLAqIXS/spVbgUzXWnzrtSMn7zc6o8I1tMTNHh6ohRSVtpFjgPzdO+cRlHCjy9dNX7uTF670no7Bxexu1TNNHN1AcMfk1Or7FXPIn73GMXJ5zSjanCmIUximILoif1CTKnrjbFtNVuB32BdxTkWVCWcuczXnavF65m/EOX2Y7hnsGXWeLxtaloKxtzU1UCmLpHsTr0VugyTLg3vxtOQYgBVgEznxBR/meTD4NQSE7wvB8qwMcBO2vjGjVBGjNt1+InmXNW1yU6EPkoo7/elMr5JBu1kx0cnCWg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 04 May 2022 13:56:02 +0000
  • Ironport-data: A9a23:mur9R64s/zBnHLhLlEO+ewxRtELGchMFZxGqfqrLsTDasY5as4F+v mFNWmCHPPeKNjPyeNsgOY+xpENSvcfdyN5qHQBpqi0wHi5G8cbLO4+Ufxz6V8+wwmwvb67FA +E2MISowBUcFyeEzvuVGuG96yE6j8lkf5KkYAL+EnkZqTRMFWFw0HqPp8Zj2tQy2YXhX1vX0 T/Pi5a31GGNimYc3l08s8pvmDs31BglkGpF1rCWTakjUG72zxH5PrpGTU2CByKQrr1vNvy7X 47+IISRpQs1yfuP5uSNyd4XemVSKlLb0JPnZnB+A8BOiTAazsA+PzpS2FPxpi67hh3Q9+2dx umhurSyTiktB6z1x90tQgFIDhpSZYdD0a7udC3XXcy7lyUqclPK6tA3VQQcG91d/ex6R2ZT6 fYfNTYBKAiZgP67y666Te8qgdk/KM7sP8UUvXQIITPxVK56B8ycBfiao4YHgl/chegXdRraT 9AeZjd1KgzJfjVEO0sNCYJ4l+Ct7pX6W2MJ+Q/N+PFoi4TV5F11k57EHfXFQOeLe/dft0+Sh iHs+k2sV3n2M/Tak1Jp6EmEhOXCgCf6U4I6D6Cj+7hhh1j77nweDlgaWEW2pdG9i1WiQJRPJ koM4C0soKMuskuxQbHVRxSlpFaUsxhaXMBfe8Ul7Cmdx6yS5ByWblXoVRZEYd0i8cUwFToj0 wbQm8uzXGMx9rqIVXia67GY6yuoPjQYJnMDYilCShYZ597ko8c4iRenostfLZNZR+bdQVnYq w1mZgBk71nPpabnD5mGwG0=
  • Ironport-hdrordr: A9a23:7XvjpKqJ85fFrPNg0SA1UuwaV5u5L9V00zEX/kB9WHVpm5Oj+v xGzc5w6farsl0ssREb9uxo9pPwJE800aQFmbX5Wo3SJzUO2VHYVb2KiLGP/9SOIU3DH4JmpM Rdmu1FeafN5DtB/LnHCWuDYrEdKbC8mcjH5Ns2jU0dKz2CA5sQkzuRYTzrdnGeKjM2Z6bQQ/ Gnl7d6TnebCD0qR/X+IkNAc/nIptXNmp6jSRkaByQ/4A3LqT+z8rb1HzWRwx9bClp0sPwf2F mAtza8yrSosvm9xBOZ/2jP765OkN+k7tdYHsSDhuUcNz2poAe1Y4ZKXaGEoVkO0amSwWdvtO OJjwYrPsx15X+UVmapoSH10w2l6zoq42+K8y7tvVLT5ejCAB4qActIgoxUNjHD7VA7gd162K VXm0qEqpt+F3r77WvAzumNcysvulu/oHIkn+JWpWdYS5EiZLhYqpFa1F9JEa0HADnx5OkcYa VT5fnnlbdrmG6hHjDkVjEF+q3uYp1zJGbKfqE6gL3a79AM90oJjXfxx6Qk7wI9HdwGOtx5Dt //Q9VVfYF1P7ErhJ1GdZc8qOuMexvwqEH3QRSvyWqOLtB1B1v977jK3Z4S2MaGPLQ18bpaou WybLofjx95R37T
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 04, 2022 at 02:27:14PM +0200, Jan Beulich wrote:
> On 04.05.2022 13:20, Roger Pau Monné wrote:
> > On Wed, May 04, 2022 at 11:46:37AM +0200, Jan Beulich wrote:
> >> On 03.05.2022 16:49, Roger Pau Monné wrote:
> >>> On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote:
> >>> It would seem to me that doing it that way would also allow the
> >>> mappings to get established in blocks for domUs.
> >>
> >> ... then this would perhaps be possible.
> >>
> >>>> The installing of zero-ref writable types has in fact shown (observed
> >>>> while putting together the change) that despite the intention by the
> >>>> XSA-288 changes (affecting DomU-s only) for Dom0 a number of
> >>>> sufficiently ordinary pages (at the very least initrd and P2M ones as
> >>>> well as pages that are part of the initial allocation but not part of
> >>>> the initial mapping) still have been starting out as PGT_none, meaning
> >>>> that they would have gained IOMMU mappings only the first time these
> >>>> pages would get mapped writably. Consequently an open question is
> >>>> whether iommu_memory_setup() should set the pages to PGT_writable_page
> >>>> independent of need_iommu_pt_sync().
> >>>
> >>> I think I'm confused, doesn't the setting of PGT_writable_page happen
> >>> as a result of need_iommu_pt_sync() and having those pages added to
> >>> the IOMMU page tables? (so they can be properly tracked and IOMMU
> >>> mappings are removed if thte page is also removed)
> >>
> >> In principle yes - in guest_physmap_add_page(). But this function isn't
> >> called for the pages I did enumerate in the remark. XSA-288 really only
> >> cared about getting this right for DomU-s.
> > 
> > Would it make sense to change guest_physmap_add_page() to be able to
> > pass the page_order parameter down to iommu_map(), and then use it for
> > dom0 build instead of introducing iommu_memory_setup()?
> 
> To be quite frank: This is something that I might have been willing to
> do months ago, when this series was still fresh. If I was to start
> re-doing all of this code now, it would take far more time than it
> would have taken back then. Hence I'd like to avoid a full re-work here
> unless entirely unacceptable in the way currently done (which largely
> fits with how we have been doing Dom0 setup).

Sorry, I would have really liked to be more on time with reviews of
this, but there's always something that comes up.

> Furthermore, guest_physmap_add_page() doesn't itself call iommu_map().
> What you're suggesting would require get_page_and_type() to be able to
> work on higher-order pages. I view adjustments like this as well out
> of scope for this series.

Well, my initial thinking was to do something similar to what you
currently have in iommu_memory_setup: a direct call to iommu_map and
adjust the page types manually, but I think this will only work for
dom0 because pages are fresh at that point.  For domUs we must use
get_page_and_type so any previous mapping is also removed.

> > I think guest_physmap_add_page() will need to be adjusted at some
> > point for domUs, and hence it could be unified with dom0 usage
> > also?
> 
> As an optimization - perhaps. I view it as more important to have HVM
> guests work reasonably well (which includes the performance aspect of
> setting them up).

OK, I'm fine with focusing on HVM.

> >>>> --- a/xen/drivers/passthrough/x86/iommu.c
> >>>> +++ b/xen/drivers/passthrough/x86/iommu.c
> >>>> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i
> >>>>  
> >>>>  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >>>>  {
> >>>> -    unsigned long i, top, max_pfn;
> >>>> -    unsigned int flush_flags = 0;
> >>>> +    unsigned long i, top, max_pfn, start, count;
> >>>> +    unsigned int flush_flags = 0, start_perms = 0;
> >>>>  
> >>>>      BUG_ON(!is_hardware_domain(d));
> >>>>  
> >>>> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>>>       * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
> >>>>       * setting up potentially conflicting mappings here.
> >>>>       */
> >>>> -    i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>>> +    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>>>  
> >>>> -    for ( ; i < top; i++ )
> >>>> +    for ( i = start, count = 0; i < top; )
> >>>>      {
> >>>>          unsigned long pfn = pdx_to_pfn(i);
> >>>>          unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
> >>>> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>>>          if ( !perms )
> >>>>              rc = 0;
> >>>>          else if ( paging_mode_translate(d) )
> >>>> +        {
> >>>>              rc = p2m_add_identity_entry(d, pfn,
> >>>>                                          perms & IOMMUF_writable ? 
> >>>> p2m_access_rw
> >>>>                                                                  : 
> >>>> p2m_access_r,
> >>>>                                          0);
> >>>> +            if ( rc )
> >>>> +                printk(XENLOG_WARNING
> >>>> +                       "%pd: identity mapping of %lx failed: %d\n",
> >>>> +                       d, pfn, rc);
> >>>> +        }
> >>>> +        else if ( pfn != start + count || perms != start_perms )
> >>>> +        {
> >>>> +        commit:
> >>>> +            rc = iommu_map(d, _dfn(start), _mfn(start), count, 
> >>>> start_perms,
> >>>> +                           &flush_flags);
> >>>> +            if ( rc )
> >>>> +                printk(XENLOG_WARNING
> >>>> +                       "%pd: IOMMU identity mapping of [%lx,%lx) 
> >>>> failed: %d\n",
> >>>> +                       d, pfn, pfn + count, rc);
> >>>> +            SWAP(start, pfn);
> >>>> +            start_perms = perms;
> >>>> +            count = 1;
> >>>> +        }
> >>>>          else
> >>>> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << 
> >>>> PAGE_ORDER_4K,
> >>>> -                           perms, &flush_flags);
> >>>> +        {
> >>>> +            ++count;
> >>>> +            rc = 0;
> >>>
> >>> Seeing as we want to process this in blocks now, I wonder whether it
> >>> would make sense to take a different approach, and use a rangeset to
> >>> track which regions need to be mapped.  What gets added would be based
> >>> on the host e820 plus the options
> >>> iommu_hwdom_{strict,inclusive,reserved}.  We would then punch holes
> >>> based on the logic in hwdom_iommu_map() and finally we could iterate
> >>> over the regions afterwards using rangeset_consume_ranges().
> >>>
> >>> Not that you strictly need to do it here, just think the end result
> >>> would be clearer.
> >>
> >> The end result might indeed be, but it would be more of a change right
> >> here. Hence I'd prefer to leave that out of the series for now.
> > 
> > OK.  I think it might be nice to add a comment in that regard, mostly
> > because I tend to forget myself.
> 
> Sure, I've added another post-commit-message remark.

Sorry for being confused, but are those reflected in the final commit
message, or in the code itself?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.