[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 07/18] IOMMU/x86: perform PV Dom0 mappings in batches
On Fri, Dec 03, 2021 at 01:38:48PM +0100, Jan Beulich wrote: > On 02.12.2021 15:10, Roger Pau Monné wrote: > > On Fri, Sep 24, 2021 at 11:47:41AM +0200, Jan Beulich wrote: > >> @@ -689,7 +763,8 @@ int __init dom0_construct_pv(struct doma > >> l1tab++; > >> > >> page = mfn_to_page(_mfn(mfn)); > >> - if ( !page->u.inuse.type_info && > >> + if ( (!page->u.inuse.type_info || > >> + page->u.inuse.type_info == (PGT_writable_page | > >> PGT_validated)) && > > > > Would it be clearer to get page for all pages that have a 0 count: > > !(type_info & PGT_count_mask). Or would that interact badly with page > > table pages? > > Indeed this wouldn't work with page tables (and I recall having learned > this the hard way): Prior to mark_pv_pt_pages_rdonly() they all have a > type refcount of zero. Even if it wasn't for this, I'd prefer to not > relax the condition here more than necessary. Right. Page tables will have some types set but a count of 0. > >> @@ -720,6 +795,17 @@ int __init dom0_construct_pv(struct doma > >> /* Pages that are part of page tables must be read only. */ > >> mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages); > >> > >> + /* > >> + * This needs to come after all potentially excess > >> + * get_page_and_type(..., PGT_writable_page) invocations; see the > >> loop a > >> + * few lines further up, where the effect of calling that function in > >> an > >> + * earlier loop iteration may get overwritten by a later one. > >> + */ > >> + if ( need_iommu_pt_sync(d) && > >> + iommu_unmap(d, _dfn(PFN_DOWN(mpt_alloc) - nr_pt_pages), > >> nr_pt_pages, > >> + &flush_flags) ) > >> + BUG(); > > > > Wouldn't such unmap better happen as part of changing the types of the > > pages that become part of the guest page tables? > > Not sure - it's a single call here, but would be a call per page when > e.g. moved into mark_pv_pt_pages_rdonly(). I see. So this would result in multiple calls when moved, plus all the involved page shattering and aggregation logic. Overall it would be less error prone, as the iommu unmap would happen next to the type change, but I'm not going to insist if you think it's not worth it. The page table structure pages shouldn't be that many anyway? > >> @@ -840,22 +928,41 @@ int __init dom0_construct_pv(struct doma > >> #endif > >> while ( pfn < nr_pages ) > >> { > >> - if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == > >> NULL ) > >> + count = domain_tot_pages(d); > >> + if ( (page = alloc_chunk(d, nr_pages - count)) == NULL ) > >> panic("Not enough RAM for DOM0 reservation\n"); > >> + mfn = mfn_x(page_to_mfn(page)); > >> + > >> + if ( need_iommu_pt_sync(d) ) > >> + { > >> + rc = iommu_map(d, _dfn(mfn), _mfn(mfn), domain_tot_pages(d) - > >> count, > >> + IOMMUF_readable | IOMMUF_writable, > >> &flush_flags); > >> + if ( rc ) > >> + printk(XENLOG_ERR > >> + "pre-mapping MFN %lx (PFN %lx) into IOMMU failed: > >> %d\n", > >> + mfn, pfn, rc); > >> + } > >> + > >> while ( pfn < domain_tot_pages(d) ) > >> { > >> - mfn = mfn_x(page_to_mfn(page)); > >> + if ( !rc ) > >> + make_pages_writable(page, 1); > > > > There's quite a lot of repetition of the pattern: allocate, iommu_map, > > set as writable. Would it be possible to abstract this into some > > kind of helper? > > > > I've realized some of the allocations use alloc_chunk while others use > > alloc_domheap_pages, so it might require some work. > > Right, I'd leave the allocation part aside for the moment. I had actually > considered to fold iommu_map() and make_pages_writable() into a common > helper (or really rename make_pages_writable() and fold iommu_map() into > there). What I lacked was a reasonable, not overly long name for such a > function. I'm not overly good at naming, but I think we need to somehow find a way to place those together into a single helper. I would be fine with naming this iommu_memory_{setup,add} or some such. Marking the pages as writable is a result (or a requirement might be a better way to express it?) of adding them to the IOMMU. Would you be OK with one of those names? > >> @@ -372,16 +372,30 @@ void __hwdom_init arch_iommu_hwdom_init( > >> perms & IOMMUF_writable ? > >> p2m_access_rw > >> : > >> p2m_access_r, > >> 0); > >> + else if ( pfn != start + count || perms != start_perms ) > >> + { > >> + commit: > >> + rc = iommu_map(d, _dfn(start), _mfn(start), count, > >> + start_perms, &flush_flags); > >> + SWAP(start, pfn); > >> + start_perms = perms; > >> + count = 1; > >> + } > >> else > >> - rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K, > >> - perms, &flush_flags); > >> + { > >> + ++count; > >> + rc = 0; > >> + } > >> > >> if ( rc ) > >> printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: > >> %d\n", > >> d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc); > > > > Would be nice to print the count (or end pfn) in case it's a range. > > I can do so if you think it's worth further extra code. I can't use > "count" here in particular, as that was updated already (in context > above). The most reasonable change towards this would perhaps be to > duplicate the printk() into both the "if()" and the "else if()" bodies. Maybe. The current message gives the impression that a single pfn has been added and failed, but without printing the range that failed the message will not be that helpful in diagnosing further issues that might arise due to the mapping failure. > > While not something that you have to fix here, the logic here is > > becoming overly complicated IMO. It might be easier to just put all > > the ram and reserved regions (or everything < 4G) into a rangeset and > > then punch holes on it for non guest mappable regions, and finally use > > rangeset_consume_ranges to iterate and map those. That's likely faster > > than having to iterate over all pfns on the system, and easier to > > understand from a logic PoV. > > Maybe; I didn't spend much time on figuring possible ways of > consolidating some of this. I can give it a try after your code is merged. I think it's getting a bit messy. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |