Xen project Mailing List

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

Date: Wed, 20 Nov 2019 14:51:42 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Kevin Tian <kevin.tian@xxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Wed, 20 Nov 2019 13:51:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20.11.2019 13:08, Paul Durrant wrote: > This patch introduces a new iommu_op to facilitate a per-implementation > quarantine set up, and then further code for x86 implementations > (amd and vtd) to set up a read/wrote scratch page to serve as the source/ > target for all DMA whilst a device is assigned to dom_io. A single page in the system won't do, I'm afraid. If one guest's (prior) device is retrying reads with data containing secrets of that guest, another guest's (prior) device could end up writing this data to e.g. storage where after a guest restart it is then available to the wrong guest. Also nit: s/wrote/write/ . > The reason for doing this is that some hardware may continue to re-try > DMA, despite FLR, in the event of an error. Having a scratch page mapped > will allow pending DMA to drain and thus quiesce such buggy hardware. Without a "sink" page mapped, this would result in IOMMU faults aiui. What's the problem with having these faults surface and get handled, eventually leading to the device getting bus-mastering disabled? Is it that devices continue DMAing even when bus-mastering is off? If so, is it even safe to pass through any such device? In any event the description needs to be extended here. > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx> What about Arm? Can devices which Arm allows to assign to guests also "babble" like this after de-assignment? If not, this should be said in the description. If so, obviously that side would also want fixing. > --- a/xen/drivers/passthrough/amd/iommu_map.c > +++ b/xen/drivers/passthrough/amd/iommu_map.c > @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct domain > *domain, > return rt; > } > > +int amd_iommu_quarantine_init(struct domain *d) __init > +{ > + struct domain_iommu *hd = dom_iommu(d); > + unsigned int level; > + struct amd_iommu_pte *table; > + > + if ( hd->arch.root_table ) > + { > + ASSERT_UNREACHABLE(); > + return 0; > + } > + > + spin_lock(&hd->arch.mapping_lock); > + > + level = hd->arch.paging_mode; With DomIO being PV in principle, this is going to be the fixed value PV domains get set, merely depending on RAM size at boot time (i.e. not accounting for memory hotplug). This could be easily too little for HVM guests, which are free to extend their GFN (and hence DFN) space. Therefore I think you need to set the maximum possible level here. > + hd->arch.root_table = alloc_amd_iommu_pgtable(); > + if ( !hd->arch.root_table ) > + goto out; > + > + table = __map_domain_page(hd->arch.root_table); > + while ( level ) > + { > + struct page_info *pg; > + unsigned int i; > + > + /* > + * The pgtable allocator is fine for the leaf page, as well as > + * page table pages. > + */ > + pg = alloc_amd_iommu_pgtable(); > + if ( !pg ) > + break; > + > + for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ ) > + { > + struct amd_iommu_pte *pde = &table[i]; > + > + set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level - 1, > + true, true); This would also benefit from a comment indicating that it's fine for the leaf level as well, despite the "pde" in the name (and its sibling set_iommu_pte_present() actually existing). Of course you could as well extend the comment a few lines up. What you do need to do though is pre-fill the leaf page ... > + } > + > + unmap_domain_page(table); > + table = __map_domain_page(pg); > + level--; > + } ... here, such that possible left over secrets can't end up getting written to e.g. persistent storage or over a network. > @@ -2683,9 +2671,68 @@ static void vtd_dump_p2m_table(struct domain *d) > vtd_dump_p2m_table_level(hd->arch.pgd_maddr, > agaw_to_level(hd->arch.agaw), 0, 0); > } > > +static int intel_iommu_quarantine_init(struct domain *d) __init again. > +{ > + struct domain_iommu *hd = dom_iommu(d); > + struct dma_pte *parent; > + unsigned int level = agaw_to_level(hd->arch.agaw); Other than for AMD this is not a problem here, as all domains (currently) get the same AGAW. I wonder though whether precautions would be possible here against the "normal" domain setting getting adjusted without recalling the need to come back here. > + int rc; > + > + if ( hd->arch.pgd_maddr ) > + { > + ASSERT_UNREACHABLE(); > + return 0; > + } > + > + spin_lock(&hd->arch.mapping_lock); > + > + hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node); > + if ( !hd->arch.pgd_maddr ) > + goto out; > + > + parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr); Unnecessary cast; funnily enough you don't have one further down. > + while ( level ) > + { > + uint64_t maddr; > + unsigned int offset; > + > + /* > + * The pgtable allocator is fine for the leaf page, as well as > + * page table pages. > + */ > + maddr = alloc_pgtable_maddr(1, hd->node); > + if ( !maddr ) > + break; > + > + for ( offset = 0; offset < PTE_NUM; offset++ ) > + { > + struct dma_pte *pte = &parent[offset]; > + > + dma_set_pte_addr(*pte, maddr); > + dma_set_pte_readable(*pte); > + dma_set_pte_writable(*pte); > + } > + iommu_flush_cache_page(parent, 1); > + > + unmap_vtd_domain_page(parent); > + parent = map_vtd_domain_page(maddr); > + level--; > + } The leaf page wants scrubbing here as well. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.