|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Jan
> Beulich
> Sent: 20 November 2019 13:52
> To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Kevin Tian <kevin.tian@xxxxxxxxx>;
> Roger Pau Monné <roger.pau@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>; Andrew
> Cooper <andrew.cooper3@xxxxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the
> quarantine domain
>
> On 20.11.2019 13:08, Paul Durrant wrote:
> > This patch introduces a new iommu_op to facilitate a per-implementation
> > quarantine set up, and then further code for x86 implementations
> > (amd and vtd) to set up a read/wrote scratch page to serve as the
> source/
> > target for all DMA whilst a device is assigned to dom_io.
>
> A single page in the system won't do, I'm afraid. If one guest's
> (prior) device is retrying reads with data containing secrets of that
> guest, another guest's (prior) device could end up writing this data
> to e.g. storage where after a guest restart it is then available to
> the wrong guest.
>
True. I was unsure whether this was a concern in the scenarios we had to deal
with but I'm informed it is, and in the general case it is too.
> Also nit: s/wrote/write/ .
>
Yep. Will fix.
> > The reason for doing this is that some hardware may continue to re-try
> > DMA, despite FLR, in the event of an error. Having a scratch page mapped
> > will allow pending DMA to drain and thus quiesce such buggy hardware.
>
> Without a "sink" page mapped, this would result in IOMMU faults aiui.
> What's the problem with having these faults surface and get handled,
> eventually leading to the device getting bus-mastering disabled? Is
> it that devices continue DMAing even when bus-mastering is off? If
> so, is it even safe to pass through any such device? In any event
> the description needs to be extended here.
>
The devices in question ignore both FLR and BME and some IOMMU faults are
fatal. I believe, however, write faults are not and so I think a single
read-only 'source' page will be sufficient.
> > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
>
> What about Arm? Can devices which Arm allows to assign to guests
> also "babble" like this after de-assignment? If not, this should be
> said in the description. If so, obviously that side would also want
> fixing.
>
> > --- a/xen/drivers/passthrough/amd/iommu_map.c
> > +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct
> domain *domain,
> > return rt;
> > }
> >
> > +int amd_iommu_quarantine_init(struct domain *d)
>
> __init
>
Ok.
> > +{
> > + struct domain_iommu *hd = dom_iommu(d);
> > + unsigned int level;
> > + struct amd_iommu_pte *table;
> > +
> > + if ( hd->arch.root_table )
> > + {
> > + ASSERT_UNREACHABLE();
> > + return 0;
> > + }
> > +
> > + spin_lock(&hd->arch.mapping_lock);
> > +
> > + level = hd->arch.paging_mode;
>
> With DomIO being PV in principle, this is going to be the
> fixed value PV domains get set, merely depending on RAM size at
> boot time (i.e. not accounting for memory hotplug). This could
> be easily too little for HVM guests, which are free to extend
> their GFN (and hence DFN) space. Therefore I think you need to
> set the maximum possible level here.
>
Ok. I'd not considered memory hotplug. I'll use a static maximum value instead,
as VT-d does in general.
> > + hd->arch.root_table = alloc_amd_iommu_pgtable();
> > + if ( !hd->arch.root_table )
> > + goto out;
> > +
> > + table = __map_domain_page(hd->arch.root_table);
> > + while ( level )
> > + {
> > + struct page_info *pg;
> > + unsigned int i;
> > +
> > + /*
> > + * The pgtable allocator is fine for the leaf page, as well as
> > + * page table pages.
> > + */
> > + pg = alloc_amd_iommu_pgtable();
> > + if ( !pg )
> > + break;
> > +
> > + for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
> > + {
> > + struct amd_iommu_pte *pde = &table[i];
> > +
> > + set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level -
> 1,
> > + true, true);
>
> This would also benefit from a comment indicating that it's fine
> for the leaf level as well, despite the "pde" in the name (and
> its sibling set_iommu_pte_present() actually existing). Of course
> you could as well extend the comment a few lines up.
The AMD IOMMU conflates PDE and PTE all over the place but I'll add a comment
here to that effect.
>
> What you do need to do though is pre-fill the leaf page ...
>
> > + }
> > +
> > + unmap_domain_page(table);
> > + table = __map_domain_page(pg);
> > + level--;
> > + }
>
> ... here, such that possible left over secrets can't end up
> getting written to e.g. persistent storage or over a network.
>
That's actually one reason for using the pgtable allocator... It already does
that.
> > @@ -2683,9 +2671,68 @@ static void vtd_dump_p2m_table(struct domain *d)
> > vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd-
> >arch.agaw), 0, 0);
> > }
> >
> > +static int intel_iommu_quarantine_init(struct domain *d)
>
> __init again.
>
Ok.
> > +{
> > + struct domain_iommu *hd = dom_iommu(d);
> > + struct dma_pte *parent;
> > + unsigned int level = agaw_to_level(hd->arch.agaw);
>
> Other than for AMD this is not a problem here, as all domains
> (currently) get the same AGAW. I wonder though whether precautions
> would be possible here against the "normal" domain setting getting
> adjusted without recalling the need to come back here.
>
I could just go for a hardcoded width value here too.
> > + int rc;
> > +
> > + if ( hd->arch.pgd_maddr )
> > + {
> > + ASSERT_UNREACHABLE();
> > + return 0;
> > + }
> > +
> > + spin_lock(&hd->arch.mapping_lock);
> > +
> > + hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> > + if ( !hd->arch.pgd_maddr )
> > + goto out;
> > +
> > + parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
>
> Unnecessary cast; funnily enough you don't have one further down.
>
Sorry. Cut'n'paste.
> > + while ( level )
> > + {
> > + uint64_t maddr;
> > + unsigned int offset;
> > +
> > + /*
> > + * The pgtable allocator is fine for the leaf page, as well as
> > + * page table pages.
> > + */
> > + maddr = alloc_pgtable_maddr(1, hd->node);
> > + if ( !maddr )
> > + break;
> > +
> > + for ( offset = 0; offset < PTE_NUM; offset++ )
> > + {
> > + struct dma_pte *pte = &parent[offset];
> > +
> > + dma_set_pte_addr(*pte, maddr);
> > + dma_set_pte_readable(*pte);
> > + dma_set_pte_writable(*pte);
> > + }
> > + iommu_flush_cache_page(parent, 1);
> > +
> > + unmap_vtd_domain_page(parent);
> > + parent = map_vtd_domain_page(maddr);
> > + level--;
> > + }
>
> The leaf page wants scrubbing here as well.
>
Already done by alloc_pgtable_maddr(). I'll note in the comment in this hunk
and the AMD one that the pages are zeroed.
Paul
> Jan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |