Xen project Mailing List

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

From: "Durrant, Paul" <pdurrant@xxxxxxxxxx>

Date: Wed, 27 Nov 2019 15:31:31 +0000

Accept-language: en-GB, en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Wed, 27 Nov 2019 15:31:40 +0000

Ironport-sdr: TYWFpYcDova7Y42zjPxfECxC5ujpZIu3n6b2Jamekpo0Hx7W7+Cfkqro40mz+U7+fh0cEkuEZs cDVW11QNP+Yw==

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHVn5tNaK/V9YZGKk+zattii3WhUKeUFBAAgAsYo8A=

Thread-topic: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

> -----Original Message----- > From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Jan > Beulich > Sent: 20 November 2019 13:52 > To: Durrant, Paul <pdurrant@xxxxxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Kevin Tian <kevin.tian@xxxxxxxxx>; > Roger Pau Monné <roger.pau@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>; Andrew > Cooper <andrew.cooper3@xxxxxxxxxx> > Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the > quarantine domain > > On 20.11.2019 13:08, Paul Durrant wrote: > > This patch introduces a new iommu_op to facilitate a per-implementation > > quarantine set up, and then further code for x86 implementations > > (amd and vtd) to set up a read/wrote scratch page to serve as the > source/ > > target for all DMA whilst a device is assigned to dom_io. > > A single page in the system won't do, I'm afraid. If one guest's > (prior) device is retrying reads with data containing secrets of that > guest, another guest's (prior) device could end up writing this data > to e.g. storage where after a guest restart it is then available to > the wrong guest. > True. I was unsure whether this was a concern in the scenarios we had to deal with but I'm informed it is, and in the general case it is too. > Also nit: s/wrote/write/ . > Yep. Will fix. > > The reason for doing this is that some hardware may continue to re-try > > DMA, despite FLR, in the event of an error. Having a scratch page mapped > > will allow pending DMA to drain and thus quiesce such buggy hardware. > > Without a "sink" page mapped, this would result in IOMMU faults aiui. > What's the problem with having these faults surface and get handled, > eventually leading to the device getting bus-mastering disabled? Is > it that devices continue DMAing even when bus-mastering is off? If > so, is it even safe to pass through any such device? In any event > the description needs to be extended here. > The devices in question ignore both FLR and BME and some IOMMU faults are fatal. I believe, however, write faults are not and so I think a single read-only 'source' page will be sufficient. > > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx> > > What about Arm? Can devices which Arm allows to assign to guests > also "babble" like this after de-assignment? If not, this should be > said in the description. If so, obviously that side would also want > fixing. > > > --- a/xen/drivers/passthrough/amd/iommu_map.c > > +++ b/xen/drivers/passthrough/amd/iommu_map.c > > @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct > domain *domain, > > return rt; > > } > > > > +int amd_iommu_quarantine_init(struct domain *d) > > __init > Ok. > > +{ > > + struct domain_iommu *hd = dom_iommu(d); > > + unsigned int level; > > + struct amd_iommu_pte *table; > > + > > + if ( hd->arch.root_table ) > > + { > > + ASSERT_UNREACHABLE(); > > + return 0; > > + } > > + > > + spin_lock(&hd->arch.mapping_lock); > > + > > + level = hd->arch.paging_mode; > > With DomIO being PV in principle, this is going to be the > fixed value PV domains get set, merely depending on RAM size at > boot time (i.e. not accounting for memory hotplug). This could > be easily too little for HVM guests, which are free to extend > their GFN (and hence DFN) space. Therefore I think you need to > set the maximum possible level here. > Ok. I'd not considered memory hotplug. I'll use a static maximum value instead, as VT-d does in general. > > + hd->arch.root_table = alloc_amd_iommu_pgtable(); > > + if ( !hd->arch.root_table ) > > + goto out; > > + > > + table = __map_domain_page(hd->arch.root_table); > > + while ( level ) > > + { > > + struct page_info *pg; > > + unsigned int i; > > + > > + /* > > + * The pgtable allocator is fine for the leaf page, as well as > > + * page table pages. > > + */ > > + pg = alloc_amd_iommu_pgtable(); > > + if ( !pg ) > > + break; > > + > > + for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ ) > > + { > > + struct amd_iommu_pte *pde = &table[i]; > > + > > + set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level - > 1, > > + true, true); > > This would also benefit from a comment indicating that it's fine > for the leaf level as well, despite the "pde" in the name (and > its sibling set_iommu_pte_present() actually existing). Of course > you could as well extend the comment a few lines up. The AMD IOMMU conflates PDE and PTE all over the place but I'll add a comment here to that effect. > > What you do need to do though is pre-fill the leaf page ... > > > + } > > + > > + unmap_domain_page(table); > > + table = __map_domain_page(pg); > > + level--; > > + } > > ... here, such that possible left over secrets can't end up > getting written to e.g. persistent storage or over a network. > That's actually one reason for using the pgtable allocator... It already does that. > > @@ -2683,9 +2671,68 @@ static void vtd_dump_p2m_table(struct domain *d) > > vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd- > >arch.agaw), 0, 0); > > } > > > > +static int intel_iommu_quarantine_init(struct domain *d) > > __init again. > Ok. > > +{ > > + struct domain_iommu *hd = dom_iommu(d); > > + struct dma_pte *parent; > > + unsigned int level = agaw_to_level(hd->arch.agaw); > > Other than for AMD this is not a problem here, as all domains > (currently) get the same AGAW. I wonder though whether precautions > would be possible here against the "normal" domain setting getting > adjusted without recalling the need to come back here. > I could just go for a hardcoded width value here too. > > + int rc; > > + > > + if ( hd->arch.pgd_maddr ) > > + { > > + ASSERT_UNREACHABLE(); > > + return 0; > > + } > > + > > + spin_lock(&hd->arch.mapping_lock); > > + > > + hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node); > > + if ( !hd->arch.pgd_maddr ) > > + goto out; > > + > > + parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr); > > Unnecessary cast; funnily enough you don't have one further down. > Sorry. Cut'n'paste. > > + while ( level ) > > + { > > + uint64_t maddr; > > + unsigned int offset; > > + > > + /* > > + * The pgtable allocator is fine for the leaf page, as well as > > + * page table pages. > > + */ > > + maddr = alloc_pgtable_maddr(1, hd->node); > > + if ( !maddr ) > > + break; > > + > > + for ( offset = 0; offset < PTE_NUM; offset++ ) > > + { > > + struct dma_pte *pte = &parent[offset]; > > + > > + dma_set_pte_addr(*pte, maddr); > > + dma_set_pte_readable(*pte); > > + dma_set_pte_writable(*pte); > > + } > > + iommu_flush_cache_page(parent, 1); > > + > > + unmap_vtd_domain_page(parent); > > + parent = map_vtd_domain_page(maddr); > > + level--; > > + } > > The leaf page wants scrubbing here as well. > Already done by alloc_pgtable_maddr(). I'll note in the comment in this hunk and the AMD one that the pages are zeroed. Paul > Jan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxx > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.