[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables


  • To: 'Jan Beulich' <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
  • Date: Mon, 23 Sep 2019 15:27:53 +0000
  • Accept-language: en-GB, en-US
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=Paul.Durrant@xxxxxxxxxx; spf=Pass smtp.mailfrom=Paul.Durrant@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx
  • Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
  • Delivery-date: Mon, 23 Sep 2019 15:28:01 +0000
  • Ironport-sdr: ngIPzx8s4727xvMa6HMlC9NT7tvrmHepunykXBD59rZxkFW1C/kwFExLZwTh04Mc47ITrImSmO LPtEaP0znKjotDZv8i1usnzr3EunBl8PhR3taE/cz84+4Qa+HnAMyahlOXwJJuPQYG2ogqY8dH LMkejm9FSwsaiHXHPr1H/aCWanxwEtJVqWT3ryrLWeBX790qd/CBsDGHQJ26dN656SMScm4QFR lNrpCNxLz0VUsa0kO6fQmUlBsQCfsmFsjjsn5cNhdvl1y/utwtKRBGPX2yaTavb1CHcUjJgYJr j6U=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHVbu1YWxkPJrEqQkWka8jBzdYkyqc5aPjg
  • Thread-topic: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Jan 
> Beulich
> Sent: 19 September 2019 14:22
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>; Suravee Suthikulpanit 
> <suravee.suthikulpanit@xxxxxxx>
> Subject: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate 
> interrupt remapping tables
> 
> ACPI tables are free to list far more device coordinates than there are
> actual devices. By delaying the table allocations for most cases, and
> doing them only when an actual device is known to be present at a given
> position, overall memory used for the tables goes down from over 500k
> pages to just over 1k (on my system having such ACPI table contents).
> 
> Tables continue to get allocated right away for special entries
> (IO-APIC, HPET) as well as for alias IDs. While in the former case
> that's simply because there may not be any device at a given position,
> in the latter case this is to avoid having to introduce ref-counting of
> table usage.
> 
> The change involves invoking
> iterate_ivrs_mappings(amd_iommu_setup_device_table) a second time,
> because the function now wants to be able to find PCI devices, which
> isn't possible yet when IOMMU setup happens very early during x2APIC
> mode setup. In this context amd_iommu_init_interrupt() gets renamed as
> well.
> 
> The logic adjusting a DTE's interrupt remapping attributes also gets
> changed, such that the lack of an IRT would result in target aborted
> rather than not remapped interrupts (should any occur).
> 
> Note that for now phantom functions get separate IRTs allocated, as was
> the case before.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>

> ---
> v6: Acquire IOMMU lock in code added to amd_iommu_add_device(). Drop a
>     pointless use of the conditional operator.
> v5: New.
> ---
> TBD: This retains prior (but suspicious) behavior of not calling
>      amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
>      Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
>      changing.
> 
> Backporting note: This depends on b5fbe81196 "iommu / x86: move call to
> scan_pci_devices() out of vendor code"!
> 
> ---
>  xen/drivers/passthrough/amd/iommu_acpi.c      |   65 +++++++++++++----------
>  xen/drivers/passthrough/amd/iommu_init.c      |   73 
> ++++++++++++++++++++------
>  xen/drivers/passthrough/amd/iommu_intr.c      |    4 -
>  xen/drivers/passthrough/amd/iommu_map.c       |    5 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   43 ++++++++++++++-
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2
>  6 files changed, 143 insertions(+), 49 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -53,7 +53,8 @@ union acpi_ivhd_device {
>  };
> 
>  static void __init add_ivrs_mapping_entry(
> -    u16 bdf, u16 alias_id, u8 flags, struct amd_iommu *iommu)
> +    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
> +    struct amd_iommu *iommu)
>  {
>      struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
> 
> @@ -69,27 +70,32 @@ static void __init add_ivrs_mapping_entr
>      if ( iommu->bdf == bdf )
>          return;
> 
> -    if ( !ivrs_mappings[alias_id].intremap_table )
> +    /* Allocate interrupt remapping table if needed. */
> +    if ( iommu_intremap && !ivrs_mappings[alias_id].intremap_table )
>      {
> -         /* allocate per-device interrupt remapping table */
> -         if ( amd_iommu_perdev_intremap )
> -             ivrs_mappings[alias_id].intremap_table =
> -                 amd_iommu_alloc_intremap_table(
> -                     iommu,
> -                     &ivrs_mappings[alias_id].intremap_inuse);
> -         else
> -         {
> -             if ( shared_intremap_table == NULL  )
> -                 shared_intremap_table = amd_iommu_alloc_intremap_table(
> -                     iommu,
> -                     &shared_intremap_inuse);
> -             ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
> -             ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> -         }
> -
> -         if ( !ivrs_mappings[alias_id].intremap_table )
> -             panic("No memory for %04x:%02x:%02x.%u's IRT\n", iommu->seg,
> -                   PCI_BUS(alias_id), PCI_SLOT(alias_id), 
> PCI_FUNC(alias_id));
> +        if ( !amd_iommu_perdev_intremap )
> +        {
> +            if ( !shared_intremap_table )
> +                shared_intremap_table = amd_iommu_alloc_intremap_table(
> +                    iommu, &shared_intremap_inuse);
> +
> +            if ( !shared_intremap_table )
> +                panic("No memory for shared IRT\n");
> +
> +            ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
> +            ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> +        }
> +        else if ( alloc_irt )
> +        {
> +            ivrs_mappings[alias_id].intremap_table =
> +                amd_iommu_alloc_intremap_table(
> +                    iommu, &ivrs_mappings[alias_id].intremap_inuse);
> +
> +            if ( !ivrs_mappings[alias_id].intremap_table )
> +                panic("No memory for %04x:%02x:%02x.%u's IRT\n",
> +                      iommu->seg, PCI_BUS(alias_id), PCI_SLOT(alias_id),
> +                      PCI_FUNC(alias_id));
> +        }
>      }
> 
>      ivrs_mappings[alias_id].valid = true;
> @@ -433,7 +439,8 @@ static u16 __init parse_ivhd_device_sele
>          return 0;
>      }
> 
> -    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
> +                           iommu);
> 
>      return sizeof(*select);
>  }
> @@ -479,7 +486,7 @@ static u16 __init parse_ivhd_device_rang
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
> -                               iommu);
> +                               false, iommu);
> 
>      return dev_length;
>  }
> @@ -513,7 +520,8 @@ static u16 __init parse_ivhd_device_alia
> 
>      AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
> 
> -    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
> +                           iommu);
> 
>      return dev_length;
>  }
> @@ -568,7 +576,7 @@ static u16 __init parse_ivhd_device_alia
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, alias_id, 
> range->alias.header.data_setting,
> -                               iommu);
> +                               true, iommu);
> 
>      return dev_length;
>  }
> @@ -593,7 +601,7 @@ static u16 __init parse_ivhd_device_exte
>          return 0;
>      }
> 
> -    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
> 
>      return dev_length;
>  }
> @@ -640,7 +648,7 @@ static u16 __init parse_ivhd_device_exte
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
> -                               iommu);
> +                               false, iommu);
> 
>      return dev_length;
>  }
> @@ -733,7 +741,8 @@ static u16 __init parse_ivhd_device_spec
>      AMD_IOMMU_DEBUG("IVHD Special: %04x:%02x:%02x.%u variety %#x handle 
> %#x\n",
>                      seg, PCI_BUS(bdf), PCI_SLOT(bdf), PCI_FUNC(bdf),
>                      special->variety, special->handle);
> -    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
> +                           iommu);
> 
>      switch ( special->variety )
>      {
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -30,6 +30,7 @@
>  #include <xen/delay.h>
> 
>  static int __initdata nr_amd_iommus;
> +static bool __initdata pci_init;
> 
>  static void do_amd_iommu_irq(unsigned long data);
>  static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, 0);
> @@ -1244,17 +1245,20 @@ static int __init amd_iommu_setup_device
> 
>      BUG_ON( (ivrs_bdf_entries == 0) );
> 
> -    /* allocate 'device table' on a 4K boundary */
> -    device_table.alloc_size = PAGE_SIZE <<
> -                              get_order_from_bytes(
> -                              PAGE_ALIGN(ivrs_bdf_entries *
> -                              IOMMU_DEV_TABLE_ENTRY_SIZE));
> -    device_table.entries = device_table.alloc_size /
> -                           IOMMU_DEV_TABLE_ENTRY_SIZE;
> -
> -    device_table.buffer = allocate_buffer(device_table.alloc_size,
> -                                          "Device Table");
> -    if  ( device_table.buffer == NULL )
> +    if ( !device_table.buffer )
> +    {
> +        /* allocate 'device table' on a 4K boundary */
> +        device_table.alloc_size = PAGE_SIZE <<
> +                                  get_order_from_bytes(
> +                                  PAGE_ALIGN(ivrs_bdf_entries *
> +                                  IOMMU_DEV_TABLE_ENTRY_SIZE));
> +        device_table.entries = device_table.alloc_size /
> +                               IOMMU_DEV_TABLE_ENTRY_SIZE;
> +
> +        device_table.buffer = allocate_buffer(device_table.alloc_size,
> +                                              "Device Table");
> +    }
> +    if ( !device_table.buffer )
>          return -ENOMEM;
> 
>      /* Add device table entries */
> @@ -1263,13 +1267,46 @@ static int __init amd_iommu_setup_device
>          if ( ivrs_mappings[bdf].valid )
>          {
>              void *dte;
> +            const struct pci_dev *pdev = NULL;
> 
>              /* add device table entry */
>              dte = device_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE);
>              iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
> 
> +            if ( iommu_intremap &&
> +                 ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +                 !ivrs_mappings[bdf].intremap_table )
> +            {
> +                if ( !pci_init )
> +                    continue;
> +                pcidevs_lock();
> +                pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
> +                pcidevs_unlock();
> +            }
> +
> +            if ( pdev )
> +            {
> +                unsigned int req_id = bdf;
> +
> +                do {
> +                    ivrs_mappings[req_id].intremap_table =
> +                        amd_iommu_alloc_intremap_table(
> +                            ivrs_mappings[bdf].iommu,
> +                            &ivrs_mappings[req_id].intremap_inuse);
> +                    if ( !ivrs_mappings[req_id].intremap_table )
> +                        return -ENOMEM;
> +
> +                    if ( !pdev->phantom_stride )
> +                        break;
> +                    req_id += pdev->phantom_stride;
> +                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
> +            }
> +
>              amd_iommu_set_intremap_table(
> -                dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> +                dte,
> +                ivrs_mappings[bdf].intremap_table
> +                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
> +                : 0,
>                  iommu_intremap);
>          }
>      }
> @@ -1402,7 +1439,8 @@ int __init amd_iommu_init(bool xt)
>      if ( rc )
>          goto error_out;
> 
> -    /* allocate and initialize a global device table shared by all iommus */
> +    /* Allocate and initialize device table(s). */
> +    pci_init = !xt;
>      rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
>      if ( rc )
>          goto error_out;
> @@ -1422,7 +1460,7 @@ int __init amd_iommu_init(bool xt)
>          /*
>           * Setting up of the IOMMU interrupts cannot occur yet at the (very
>           * early) time we get here when enabling x2APIC mode. Suppress it
> -         * here, and do it explicitly in amd_iommu_init_interrupt().
> +         * here, and do it explicitly in amd_iommu_init_late().
>           */
>          rc = amd_iommu_init_one(iommu, !xt);
>          if ( rc )
> @@ -1436,11 +1474,16 @@ error_out:
>      return rc;
>  }
> 
> -int __init amd_iommu_init_interrupt(void)
> +int __init amd_iommu_init_late(void)
>  {
>      struct amd_iommu *iommu;
>      int rc = 0;
> 
> +    /* Further initialize the device table(s). */
> +    pci_init = true;
> +    if ( iommu_intremap )
> +        rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
> +
>      for_each_amd_iommu ( iommu )
>      {
>          struct irq_desc *desc;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -789,7 +789,7 @@ void amd_iommu_read_msi_from_ire(
>      }
>  }
> 
> -int __init amd_iommu_free_intremap_table(
> +int amd_iommu_free_intremap_table(
>      const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
>  {
>      void **tblp;
> @@ -814,7 +814,7 @@ int __init amd_iommu_free_intremap_table
>      return 0;
>  }
> 
> -void *__init amd_iommu_alloc_intremap_table(
> +void *amd_iommu_alloc_intremap_table(
>      const struct amd_iommu *iommu, unsigned long **inuse_map)
>  {
>      unsigned int order = intremap_table_order(iommu);
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -116,8 +116,9 @@ void __init amd_iommu_set_intremap_table
>      struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
>  {
>      dte->it_root = intremap_ptr >> 6;
> -    dte->int_tab_len = IOMMU_INTREMAP_ORDER;
> -    dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +    dte->int_tab_len = intremap_ptr ? IOMMU_INTREMAP_ORDER : 0;
> +    dte->int_ctl = intremap_ptr ? IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED
> +                                : IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
>      dte->ig = false; /* unmapped interrupts result in i/o page faults */
>      dte->iv = valid;
>  }
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -164,7 +164,7 @@ static int __init iov_detect(void)
>      if ( !iommu_enable && !iommu_intremap )
>          return 0;
> 
> -    if ( (init_done ? amd_iommu_init_interrupt()
> +    if ( (init_done ? amd_iommu_init_late()
>                      : amd_iommu_init(false)) != 0 )
>      {
>          printk("AMD-Vi: Error initialization\n");
> @@ -428,6 +428,7 @@ static int amd_iommu_add_device(u8 devfn
>  {
>      struct amd_iommu *iommu;
>      u16 bdf;
> +    struct ivrs_mappings *ivrs_mappings;
> 
>      if ( !pdev->domain )
>          return -EINVAL;
> @@ -457,6 +458,36 @@ static int amd_iommu_add_device(u8 devfn
>          return -ENODEV;
>      }
> 
> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
> +    bdf = PCI_BDF2(pdev->bus, devfn);
> +    if ( !ivrs_mappings ||
> +         !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
> +        return -EPERM;
> +
> +    if ( iommu_intremap &&
> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +         !ivrs_mappings[bdf].intremap_table )
> +    {
> +        unsigned long flags;
> +
> +        ivrs_mappings[bdf].intremap_table =
> +            amd_iommu_alloc_intremap_table(
> +                iommu, &ivrs_mappings[bdf].intremap_inuse);
> +        if ( !ivrs_mappings[bdf].intremap_table )
> +            return -ENOMEM;
> +
> +        spin_lock_irqsave(&iommu->lock, flags);
> +
> +        amd_iommu_set_intremap_table(
> +            iommu->dev_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE),
> +            virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> +            iommu_intremap);
> +
> +        amd_iommu_flush_device(iommu, bdf);
> +
> +        spin_unlock_irqrestore(&iommu->lock, flags);
> +    }
> +
>      amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
>      return 0;
>  }
> @@ -465,6 +496,8 @@ static int amd_iommu_remove_device(u8 de
>  {
>      struct amd_iommu *iommu;
>      u16 bdf;
> +    struct ivrs_mappings *ivrs_mappings;
> +
>      if ( !pdev->domain )
>          return -EINVAL;
> 
> @@ -480,6 +513,14 @@ static int amd_iommu_remove_device(u8 de
>      }
> 
>      amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev);
> +
> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
> +    bdf = PCI_BDF2(pdev->bus, devfn);
> +    if ( amd_iommu_perdev_intremap &&
> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +         ivrs_mappings[bdf].intremap_table )
> +        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf]);
> +
>      return 0;
>  }
> 
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -50,7 +50,7 @@ void get_iommu_features(struct amd_iommu
>  /* amd-iommu-init functions */
>  int amd_iommu_prepare(bool xt);
>  int amd_iommu_init(bool xt);
> -int amd_iommu_init_interrupt(void);
> +int amd_iommu_init_late(void);
>  int amd_iommu_update_ivrs_mapping_acpi(void);
>  int iov_adjust_irq_affinities(void);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.