[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 07/18] IOMMU/x86: perform PV Dom0 mappings in batches


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 10 Dec 2021 10:36:28 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TIyypW6AzLL5gp/yEy22ql5gxDAYqGlkZ9UX1xagXgM=; b=QQma8yulTZ5H14tzZy6OaSMVP2mEfUHcZGZVt/hE5Is6cAQ5dxtTqCitS/AoRhwd2Y1NLYsxUs/SObrSpESnshndNhwTK2r3RreYkwBykIkPhcTlJwpCHY9TnWQNKl0sHTwWC+PfF0MyleRr0e8GL1uBHBpQ5yQQIKt3tp2YF0iluI5Q03Lkuapw3FhXABYShkwOWiJhUNNNfHc3DqXnk0HoiK7KbLZ6rHxcnfXMCb9iICD2wiYPie8YYE97/UhAdN/ppsDSMNdRU9kTGhXI1MZ5OuykmJoVOhcutXlL/bmts7Zsc+GHMX2CzwwVhD4s9EB0Yq6kbSU4fCdBG2znnw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Fq71SeUkMIG5r3wh2VAJ8aAUhtmXXVCivspvkM/8V2ALLc1gSHjlIRG301fWB5c2XHthoo250ve/ms6bOkHGJ5EkVlMbtXZaUlcqVx6d/MELTKO00bJ8YorVmG6+yQLJuM4ppFDvsbhOQZDL4Tn5da40HecfmeLh3Ch2BDA2WgEfqXlP7XC/udP5BIbtUGkql10HI9SnazxRQ/bvfxybZLaklhGZs/0icUaf+mXujIsOKPK9ZgjaGwHXfHT9YJzZfmHJOmUTDa4rT/rl+T6R2l1XBnB/prWZzqH6z2CCQecCqMhnOWMYXX+3YoJXyTYs3a6DTuqMMz34Qr3KwucVWg==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Fri, 10 Dec 2021 09:40:53 +0000
  • Ironport-data: A9a23:KnjDkqPxpGeFaDDvrR1HkMFynXyQoLVcMsEvi/4bfWQNrUp0gTwDy mdNDGqBbquCMWPxf492btvi/EsC75HXx9NrHgto+SlhQUwRpJueD7x1DKtR0wB+jCHnZBg6h ynLQoCYdKjYdpJYz/uUGuCJQUNUjMlkfZKhTr6UUsxNbVU8En5400s7w7dRbrNA2rBVPSvc4 bsenOWHULOV82Yc3rU8sv/rRLtH5ZweiRtA1rAMTakjUGz2zhH5OKk3N6CpR0YUd6EPdgKMq 0Qv+5nilo/R109F5tpICd8XeGVSKlLZFVDmZna7x8FOK/WNz8A/+v9TCRYSVatYoyyPlopP9 clijKGpTikWDqP1gcoiXjANRkmSPYUekFPGCX22sMjVxEzaaXr8hf5pCSnaP6VBpLwxWzsXs 6VFdnZdNXhvhMrvqF6/YvNrick5atHiIasUu216zCGfBvEjKXzGa/ibu4EHgm1r7ixINa/3T YlaNz10Vz7dcQNJEX0bGpU9zc790xETdBUH8QnI9MLb+VP71AVs1JD9PdyTfcaFLe1Kmm6Iq 2SA+H72ajkYPtGCzTuO8lq3m/TC2yj8Xeo6BLC+s/JnnlCX7mgSEwENE0u2p+GjjUyzUM4ZL FYbkhfCtoBrqhbtFIOkGUTl/jjU5XbwRua8DcUj7gjX9JfIzD+kH28qcQ5PUcUIi8IfEGlCO kCyo/vlAjlmsbuwQH2b96uJoT7aBRX5PVPudgdfE1JbvoCLTJUby0uWE409SPLdYsjdRGmoq w1muhTSkFn6YSQj86ygtW7KjDu3znQiZl5kv16HNo5JA+4QWWJEW2BKwQSLhRqjBNzAJrVkg JTis5LGhAzpJcvS/BFhuM1XQNmUCw+taVUwe2JHEZg77CiK8HW+Z41W6zwWDB43aZdcJmK1P heC5Vs5CHpv0J2CN/cfj2WZUZtC8EQdPY69CqC8giRmPPCdizNrDAkxPBXNjggBYWAnkL0lO IfzTCpfJS1yNEiT9xLvH711+eZynkgWnDqPLbimn0XP+efPPxa9FOZaWGZim8hktctoVi2Oq I0BXyZLoj0CONDDjt7/rdROcAtUdCdjXvgbaaV/L4a+H+avI0l4Y9f5yrI9YY112aNTk+bD5 HamXUFEjlH4gBX6xc+iMxiPsZvjAsRyq2wVJyspMQr60nQve9/3vqwea4E2bf8s8+k6lax4S PwMesOhBPVTS2uYp2RBPMel9IEyJg62gQ+uPja+ZGRtdZBXWAGUqMTveRHi9XdSA3Pv59c+u bCpyijSXYEHG1Z5FM/TZf/2lwGxsHERlfhcRUzNJtUPKkzg/JIzc376j+MtItFKIhLGn2PI2 wGTCBYehO/Mv45qr4WZ2fHa99+kSrIsEFBbEm/X6aeNGRPbpmfzk5VdVOuofCzGUD+m8quVe ugIner3N+cKnQgWvtMkQapr1683+/Dmu6ReklZ/BHzOYlmmVuFgL32B0ZUdv6FB3OYE6w6/W 0bJ8dhGI7SZfsjiFQdJdgYia+2C09ASmyXTsqtpcBmruncv8erVS1hWMjmNlDdZfel8P44Sy Os8vNIbtl6kgR0wP9fa1i1Z+gxg9JDbv3nLYn3CPLLWtw==
  • Ironport-hdrordr: A9a23:zWgr4atOOdlyHeYU5qlNAj5h7skC7IMji2hC6mlwRA09TyXGra +TdaUguSMc1gx9ZJhBo7G90KnpewK6yXdQ2/hqAV7EZniahILIFvAY0WKG+VPd8kLFh4xgPM tbAs1D4ZjLfCRHZKXBkXiF+rQbsaC6GcmT7I+0pRcdLnAbV0gj1XYANu/yKDwJeOAsP+teKH Pz3Lsim9L2Ek5nEfhTS0N1FdTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y90g0FWz1C7L8++S yd+jaJqpmLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjowwJi/3ggilSYx9U/mpvSwzosuo9FE2+e O87isIDoBW0Tf8b2u1qRzi103J1ysv0WbrzRuijX7qsaXCNXsHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5vDTPisa13ackD4Hq6o+nnZfWYwRZPt6tooE5n5YF58GAWbT9J0nKu 9zF8vRjcwmP29yV0qp/VWH/ebcHEjaRny9Mw0/U42uondrdUlCvgslLJd1pAZFyHo/I6M0kd gsfJ4Y042mdfVmH56VMt1xNvdfOla9Mi4kD1jiVGgPNJt3c04l+KSHq4nc2omRCeg1Jd0J6d L8bG8=
  • Ironport-sdr: 20nt6s2aK78oGIjHzTiPvgyIC7pGXJpjpf4p35iEQYwj40djTF7oxLWvh8jGEfVP9YWZY8Tirt K28kw68m4r/tgjD09K6FZYgvispUX8yOqWcC4UBQh+HnHI8nN0W7+hiKbwR2nO8jncwuBZurYq OuYrT8Y8DKNxMf6SnhkOvYw2B0pGIuuGw6to3DYVVqYwYTIX+H+O3HkrVcmMj37pwEOos42C3Y 2v3GVbzXyS6VgPgbYJwr3bZD1vxN9uD+wgX4TeDF+GRr+V/ER6b/iYCoJ3uvQkUc/UU17kmhPG ol1qUk/gipRXeAfdE/Fz2Srx
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Dec 03, 2021 at 01:38:48PM +0100, Jan Beulich wrote:
> On 02.12.2021 15:10, Roger Pau Monné wrote:
> > On Fri, Sep 24, 2021 at 11:47:41AM +0200, Jan Beulich wrote:
> >> @@ -689,7 +763,8 @@ int __init dom0_construct_pv(struct doma
> >>          l1tab++;
> >>  
> >>          page = mfn_to_page(_mfn(mfn));
> >> -        if ( !page->u.inuse.type_info &&
> >> +        if ( (!page->u.inuse.type_info ||
> >> +              page->u.inuse.type_info == (PGT_writable_page | 
> >> PGT_validated)) &&
> > 
> > Would it be clearer to get page for all pages that have a 0 count:
> > !(type_info & PGT_count_mask). Or would that interact badly with page
> > table pages?
> 
> Indeed this wouldn't work with page tables (and I recall having learned
> this the hard way): Prior to mark_pv_pt_pages_rdonly() they all have a
> type refcount of zero. Even if it wasn't for this, I'd prefer to not
> relax the condition here more than necessary.

Right. Page tables will have some types set but a count of 0.

> >> @@ -720,6 +795,17 @@ int __init dom0_construct_pv(struct doma
> >>      /* Pages that are part of page tables must be read only. */
> >>      mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
> >>  
> >> +    /*
> >> +     * This needs to come after all potentially excess
> >> +     * get_page_and_type(..., PGT_writable_page) invocations; see the 
> >> loop a
> >> +     * few lines further up, where the effect of calling that function in 
> >> an
> >> +     * earlier loop iteration may get overwritten by a later one.
> >> +     */
> >> +    if ( need_iommu_pt_sync(d) &&
> >> +         iommu_unmap(d, _dfn(PFN_DOWN(mpt_alloc) - nr_pt_pages), 
> >> nr_pt_pages,
> >> +                     &flush_flags) )
> >> +        BUG();
> > 
> > Wouldn't such unmap better happen as part of changing the types of the
> > pages that become part of the guest page tables?
> 
> Not sure - it's a single call here, but would be a call per page when
> e.g. moved into mark_pv_pt_pages_rdonly().

I see. So this would result in multiple calls when moved, plus all the
involved page shattering and aggregation logic. Overall it would be
less error prone, as the iommu unmap would happen next to the type
change, but I'm not going to insist if you think it's not worth it.
The page table structure pages shouldn't be that many anyway?

> >> @@ -840,22 +928,41 @@ int __init dom0_construct_pv(struct doma
> >>  #endif
> >>      while ( pfn < nr_pages )
> >>      {
> >> -        if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == 
> >> NULL )
> >> +        count = domain_tot_pages(d);
> >> +        if ( (page = alloc_chunk(d, nr_pages - count)) == NULL )
> >>              panic("Not enough RAM for DOM0 reservation\n");
> >> +        mfn = mfn_x(page_to_mfn(page));
> >> +
> >> +        if ( need_iommu_pt_sync(d) )
> >> +        {
> >> +            rc = iommu_map(d, _dfn(mfn), _mfn(mfn), domain_tot_pages(d) - 
> >> count,
> >> +                           IOMMUF_readable | IOMMUF_writable, 
> >> &flush_flags);
> >> +            if ( rc )
> >> +                printk(XENLOG_ERR
> >> +                       "pre-mapping MFN %lx (PFN %lx) into IOMMU failed: 
> >> %d\n",
> >> +                       mfn, pfn, rc);
> >> +        }
> >> +
> >>          while ( pfn < domain_tot_pages(d) )
> >>          {
> >> -            mfn = mfn_x(page_to_mfn(page));
> >> +            if ( !rc )
> >> +                make_pages_writable(page, 1);
> > 
> > There's quite a lot of repetition of the pattern: allocate, iommu_map,
> > set as writable. Would it be possible to abstract this into some
> > kind of helper?
> > 
> > I've realized some of the allocations use alloc_chunk while others use
> > alloc_domheap_pages, so it might require some work.
> 
> Right, I'd leave the allocation part aside for the moment. I had actually
> considered to fold iommu_map() and make_pages_writable() into a common
> helper (or really rename make_pages_writable() and fold iommu_map() into
> there). What I lacked was a reasonable, not overly long name for such a
> function.

I'm not overly good at naming, but I think we need to somehow find a
way to place those together into a single helper.

I would be fine with naming this iommu_memory_{setup,add} or some
such. Marking the pages as writable is a result (or a requirement
might be a better way to express it?) of adding them to the IOMMU.
Would you be OK with one of those names?

> >> @@ -372,16 +372,30 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>                                          perms & IOMMUF_writable ? 
> >> p2m_access_rw
> >>                                                                  : 
> >> p2m_access_r,
> >>                                          0);
> >> +        else if ( pfn != start + count || perms != start_perms )
> >> +        {
> >> +        commit:
> >> +            rc = iommu_map(d, _dfn(start), _mfn(start), count,
> >> +                           start_perms, &flush_flags);
> >> +            SWAP(start, pfn);
> >> +            start_perms = perms;
> >> +            count = 1;
> >> +        }
> >>          else
> >> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
> >> -                           perms, &flush_flags);
> >> +        {
> >> +            ++count;
> >> +            rc = 0;
> >> +        }
> >>  
> >>          if ( rc )
> >>              printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: 
> >> %d\n",
> >>                     d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc);
> > 
> > Would be nice to print the count (or end pfn) in case it's a range.
> 
> I can do so if you think it's worth further extra code. I can't use
> "count" here in particular, as that was updated already (in context
> above). The most reasonable change towards this would perhaps be to
> duplicate the printk() into both the "if()" and the "else if()" bodies.

Maybe. The current message gives the impression that a single pfn has
been added and failed, but without printing the range that failed the
message will not be that helpful in diagnosing further issues that
might arise due to the mapping failure.

> > While not something that you have to fix here, the logic here is
> > becoming overly complicated IMO. It might be easier to just put all
> > the ram and reserved regions (or everything < 4G) into a rangeset and
> > then punch holes on it for non guest mappable regions, and finally use
> > rangeset_consume_ranges to iterate and map those. That's likely faster
> > than having to iterate over all pfns on the system, and easier to
> > understand from a logic PoV.
> 
> Maybe; I didn't spend much time on figuring possible ways of
> consolidating some of this.

I can give it a try after your code is merged. I think it's getting a
bit messy.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.