[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Jan Beulich <jbeulich@xxxxxxxx>
Date: Wed, 18 Sep 2019 10:55:22 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
Delivery-date: Wed, 18 Sep 2019 08:55:40 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 17.09.2019 15:10, Andrew Cooper wrote:
> On 06/08/2019 14:09, Jan Beulich wrote:
>> ACPI tables are free to list far more device coordinates than there are
>> actual devices. By delaying the table allocations for most cases, and
>> doing them only when an actual device is known to be present at a given
>> position, overall memory used for the tables goes down from over 500k
>> pages to just over 1k ones.
> 
> This is slightly awkward grammar.  While I don't think it is strictly
> speaking incorrect, it would be more normal to phrase as "just over 1k
> pages".

Changed, albeit to me the double "pages" sounds odd as well. Would
"of them" be any better than "ones"?

>> ---
>> TBD: This retains prior (but suspicious) behavior of not calling
>>      amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
>>      Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
>>      changing.
> 
> How would an invalid entry get DTE.V set in the first place?

DTE.V gets set by amd_iommu_set_root_page_table(), which in turn gets
called from amd_iommu_setup_domain_device() alone. It's only the
latter function's callers which obtain (and possibly check) the
corresponding IVRS mappings entry. Hence to me there's a sufficient
disconnect between setting of DTE.V and DTE.IV.

Plus, looking at e.g. amd_iommu_add_device(), there's ample room for
not even making it to amd_iommu_set_intremap_table(), due to earlier
errors.

> Whatever the old behaviour may have been, we shouldn't have code which
> comes with a chance of having non-remapped interrupts by accident.

We can't make amd_iommu_set_root_page_table() set dte->iv to 1, as
it gets called only after amd_iommu_set_intremap_table() in
amd_iommu_add_device(). But we could of course make it do so when
it finds dte->it_root be zero. Yet I wonder if it wasn't more safe
to have DTEs start out with the field set this way.

Along these lines I'm also not convinced it is a good idea for
amd_iommu_set_intremap_table() to have a "valid" parameter in the
first place: It's okay as long as all callers pass iommu_intremap,
but it would seem to me that - as already said - we'd want DTEs be
set this way right when a DT gets allocated. If you agree, I'll
happily add a patch doing so to the end of this series (there's
meanwhile a patch re-arranging DT allocation anyway).

>> @@ -1266,13 +1270,46 @@ static int __init amd_iommu_setup_device
>>          if ( ivrs_mappings[bdf].valid )
>>          {
>>              void *dte;
>> +            const struct pci_dev *pdev = NULL;
>>  
>>              /* add device table entry */
>>              dte = device_table.buffer + (bdf *
>> IOMMU_DEV_TABLE_ENTRY_SIZE);
>>              iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
>>  
>> +            if ( iommu_intremap &&
>> +                 ivrs_mappings[bdf].dte_requestor_id == bdf &&
>> +                 !ivrs_mappings[bdf].intremap_table )
>> +            {
>> +                if ( !pci_init )
>> +                    continue;
>> +                pcidevs_lock();
>> +                pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
>> +                pcidevs_unlock();
>> +            }
>> +
>> +            if ( pdev )
>> +            {
>> +                unsigned int req_id = bdf;
>> +
>> +                do {
>> +                    ivrs_mappings[req_id].intremap_table =
>> +                        amd_iommu_alloc_intremap_table(
>> +                            ivrs_mappings[bdf].iommu,
>> +                            &ivrs_mappings[req_id].intremap_inuse);
>> +                    if ( !ivrs_mappings[req_id].intremap_table )
>> +                        return -ENOMEM;
>> +
>> +                    if ( !pdev->phantom_stride )
>> +                        break;
>> +                    req_id += pdev->phantom_stride;
>> +                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
>> +            }
>> +
>>              amd_iommu_set_intremap_table(
>> -                dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
>> +                dte,
>> +                ivrs_mappings[bdf].intremap_table
>> +                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
>> +                : 0,
> 
> Under what circumstances can ivrs_mappings[bdf].intremap_table be NULL,
> given the ENOMEM above?

The "ivrs_mappings[bdf].dte_requestor_id == bdf" part of the conditional
around the setting of pdev can result in allocation (and hence the
ENOMEM error path) to be bypassed.

> This case isn't very clear cut given the !pdev possibility, but...
> 
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -458,6 +459,32 @@ static int amd_iommu_add_device(u8 devfn
>>          return -ENODEV;
>>      }
>>  
>> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
>> +    bdf = PCI_BDF2(pdev->bus, devfn);
>> +    if ( !ivrs_mappings ||
>> +         !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
>> +        return -EPERM;
>> +
>> +    if ( iommu_intremap &&
>> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
>> +         !ivrs_mappings[bdf].intremap_table )
>> +    {
>> +        ivrs_mappings[bdf].intremap_table =
>> +            amd_iommu_alloc_intremap_table(
>> +                iommu, &ivrs_mappings[bdf].intremap_inuse);
>> +        if ( !ivrs_mappings[bdf].intremap_table )
>> +            return -ENOMEM;
>> +
>> +        amd_iommu_set_intremap_table(
>> +            iommu->dev_table.buffer + (bdf *
>> IOMMU_DEV_TABLE_ENTRY_SIZE),
>> +            ivrs_mappings[bdf].intremap_table
>> +            ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
>> +            : 0,
> 
> ... this case definitely cannot occur.

Oh, missing cleanup after copy-and-paste. Thanks for noticing.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
  - From: Andrew Cooper

References:
- Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH v5 10/10] AMD/IOMMU: restrict interrupt remapping table sizes
Next by Date: Re: [Xen-devel] [PATCH v2 5/9] x86/HVM: refuse CR3 loads with reserved (upper) bits set
Previous by thread: Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
Next by thread: Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.