[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



>>> On 23.12.15 at 09:25, <quan.xu@xxxxxxxxx> wrote:
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -190,9 +190,19 @@ static int queue_invalidate_wait(struct iommu *iommu,
>  static int invalidate_sync(struct iommu *iommu)
>  {
>      struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
> +    int rc;
>  
>      if ( qi_ctrl->qinval_maddr )
> -        return queue_invalidate_wait(iommu, 0, 1, 1);
> +    {
> +        rc = queue_invalidate_wait(iommu, 0, 1, 1);
> +        if ( rc )
> +        {
> +            if ( rc == -ETIMEDOUT )
> +                panic("Queue invalidate wait descriptor was timeout.\n");

Unless I'm overlooking something this doesn't replace and existing
panic(), and I think I had indicated quite clearly that this series
shouldn't add any new ones, unless the alternative of crashing
the owning domain is completely unfeasible.

> +                panic("Queue invalidate wait descriptor was timeout.\n");
> +            return rc;
> +        }
> +    }
> +
>      return 0;
>  }

Please avoid introducing multiple return points when this can be
trivially avoided.

> @@ -229,6 +239,63 @@ int qinval_device_iotlb(struct iommu *iommu,
>      return 0;
>  }
>  
> +static void dev_invalidate_iotlb_timeout(struct iommu *iommu, u16 did,
> +                                         u16 seg, u8 bus, u8 devfn)
> +{
> +    struct domain *d;
> +    struct pci_dev *pdev;
> +
> +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> +    ASSERT(d);

Don't you need to acquire some lock in order to safely assert this?
Also note that unused slots hold zero, i.e. there's at least a
theoretical risk of problems here when you don't also look at
iommu->domid_bitmap.

> +    for_each_pdev(d, pdev)
> +    {
> +        if ( (pdev->seg == seg) &&
> +             (pdev->bus == bus) &&
> +             (pdev->devfn == devfn) )
> +        {
> +            if ( pdev->domain )

If the device is on the domain's list, can this reasonably be false?
I.e. don't you rather mean ASSERT() here?

> +            {
> +                list_del(&pdev->domain_list);

This should happen under pcidevs_lock - you need to either acquire
it or ASSERT() it being held.

> +                pdev->domain = NULL;
> +            }
> +
> +            if ( pci_hide_device(bus, devfn) )
> +            {
> +                printk(XENLOG_ERR
> +                       "IOMMU hide device %04x:%02x:%02x error.",
> +                       seg, bus, devfn);
> +                break;
> +            }
> +
> +            break;
> +        }

Redundant breaks.

> +    }
> +
> +    if ( !is_hardware_domain(d) )
> +        domain_crash(d);

I wonder whether the device hiding shouldn't also be avoided if
the device is owned by hwdom.

> @@ -349,9 +416,18 @@ static int flush_iotlb_qi(
>          queue_invalidate_iotlb(iommu,
>                                 type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
>                                 dw, did, size_order, 0, addr);
> +
> +        /*
> +         * Synchronize with hardware for invalidation request descriptors
> +         * submitted before Device-TLB invalidate descriptor.
> +         */
> +        rc = invalidate_sync(iommu);
> +        if ( rc )
> +             return rc;
> +
>          if ( flush_dev_iotlb )
>              ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> -        rc = invalidate_sync(iommu);
> +
>          if ( !ret )
>              ret = rc;
>      }

This change is because of ...?

> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -162,6 +162,19 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
>              return -EOPNOTSUPP;
>          }
>  
> +        /*
> +         * Synchronize with hardware for Device-TLB invalidate
> +         * descriptor.
> +         */
> +        ret = dev_invalidate_iotlb_sync(iommu, did, pdev->seg,
> +                                        pdev->bus, pdev->devfn);
> +        if ( ret )
> +        {
> +            dprintk(XENLOG_WARNING VTDPREFIX,
> +                    "Device-TLB flush timeout.\n");

Are you aware that dprintk() compiles to nothing in non-debug builds?
Plus logging the error code and device coordinates might turn out
helpful in such cases. But first of all - if there was a timeout, we'd
make it here only for Dom0. Perhaps the printing should move into
an else to the domain_crash()? And if there was another error, the
message would be outright wrong.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.