[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain



On 20.11.2019 13:08, Paul Durrant wrote:
> This patch introduces a new iommu_op to facilitate a per-implementation
> quarantine set up, and then further code for x86 implementations
> (amd and vtd) to set up a read/wrote scratch page to serve as the source/
> target for all DMA whilst a device is assigned to dom_io.

A single page in the system won't do, I'm afraid. If one guest's
(prior) device is retrying reads with data containing secrets of that
guest, another guest's (prior) device could end up writing this data
to e.g. storage where after a guest restart it is then available to
the wrong guest.

Also nit: s/wrote/write/ .

> The reason for doing this is that some hardware may continue to re-try
> DMA, despite FLR, in the event of an error. Having a scratch page mapped
> will allow pending DMA to drain and thus quiesce such buggy hardware.

Without a "sink" page mapped, this would result in IOMMU faults aiui.
What's the problem with having these faults surface and get handled,
eventually leading to the device getting bus-mastering disabled? Is
it that devices continue DMAing even when bus-mastering is off? If
so, is it even safe to pass through any such device? In any event
the description needs to be extended here.

> Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>

What about Arm? Can devices which Arm allows to assign to guests
also "babble" like this after de-assignment? If not, this should be
said in the description. If so, obviously that side would also want
fixing.

> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct domain 
> *domain,
>      return rt;
>  }
>  
> +int amd_iommu_quarantine_init(struct domain *d)

__init

> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +    unsigned int level;
> +    struct amd_iommu_pte *table;
> +
> +    if ( hd->arch.root_table )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +
> +    level = hd->arch.paging_mode;

With DomIO being PV in principle, this is going to be the
fixed value PV domains get set, merely depending on RAM size at
boot time (i.e. not accounting for memory hotplug). This could
be easily too little for HVM guests, which are free to extend
their GFN (and hence DFN) space. Therefore I think you need to
set the maximum possible level here.

> +    hd->arch.root_table = alloc_amd_iommu_pgtable();
> +    if ( !hd->arch.root_table )
> +        goto out;
> +
> +    table = __map_domain_page(hd->arch.root_table);
> +    while ( level )
> +    {
> +        struct page_info *pg;
> +        unsigned int i;
> +
> +        /*
> +         * The pgtable allocator is fine for the leaf page, as well as
> +         * page table pages.
> +         */
> +        pg = alloc_amd_iommu_pgtable();
> +        if ( !pg )
> +            break;
> +
> +        for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
> +        {
> +            struct amd_iommu_pte *pde = &table[i];
> +
> +            set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level - 1,
> +                                  true, true);

This would also benefit from a comment indicating that it's fine
for the leaf level as well, despite the "pde" in the name (and
its sibling set_iommu_pte_present() actually existing). Of course
you could as well extend the comment a few lines up.

What you do need to do though is pre-fill the leaf page ...

> +        }
> +
> +        unmap_domain_page(table);
> +        table = __map_domain_page(pg);
> +        level--;
> +    }

... here, such that possible left over secrets can't end up
getting written to e.g. persistent storage or over a network.

> @@ -2683,9 +2671,68 @@ static void vtd_dump_p2m_table(struct domain *d)
>      vtd_dump_p2m_table_level(hd->arch.pgd_maddr, 
> agaw_to_level(hd->arch.agaw), 0, 0);
>  }
>  
> +static int intel_iommu_quarantine_init(struct domain *d)

__init again.

> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +    struct dma_pte *parent;
> +    unsigned int level = agaw_to_level(hd->arch.agaw);

Other than for AMD this is not a problem here, as all domains
(currently) get the same AGAW. I wonder though whether precautions
would be possible here against the "normal" domain setting getting
adjusted without recalling the need to come back here.

> +    int rc;
> +
> +    if ( hd->arch.pgd_maddr )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +
> +    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> +    if ( !hd->arch.pgd_maddr )
> +        goto out;
> +
> +    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);

Unnecessary cast; funnily enough you don't have one further down.

> +    while ( level )
> +    {
> +        uint64_t maddr;
> +        unsigned int offset;
> +
> +        /*
> +         * The pgtable allocator is fine for the leaf page, as well as
> +         * page table pages.
> +         */
> +        maddr = alloc_pgtable_maddr(1, hd->node);
> +        if ( !maddr )
> +            break;
> +
> +        for ( offset = 0; offset < PTE_NUM; offset++ )
> +        {
> +            struct dma_pte *pte = &parent[offset];
> +
> +            dma_set_pte_addr(*pte, maddr);
> +            dma_set_pte_readable(*pte);
> +            dma_set_pte_writable(*pte);
> +        }
> +        iommu_flush_cache_page(parent, 1);
> +
> +        unmap_vtd_domain_page(parent);
> +        parent = map_vtd_domain_page(maddr);
> +        level--;
> +    }

The leaf page wants scrubbing here as well.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.