[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC] pci/ats: do not allow broken devices to be assigned


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 24 Feb 2022 16:33:10 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5ZrrsloNWEExtgxPvaYwMkJWXAdO/XFAyldvEfQrlso=; b=HyHtJxbo2AZhZshKs+xsmgokCf/AHwfGtR/hTKN8TZJ+44Zwxjcpq5AXIYPoiCyTMEvnyZQmxYH2VJVQIL9vrjXjjLelA5Ebg31kytrxZkfDrm1zikWnIrJWxd7VlpupvU0K2M/j5TpTZPUbeqXcJXMVRNgJb0g0uH17+Ux0orVZh6Oelpnb3C+sf1eJa2D/vIJIaCJdfGiXw+vNqFaNvHCCdgWURdqB3dTwB/TN/NJpd6mxQMe1o2Io3j7orCeuq/3P9ljPE7xrT4mvjJW3eMTscgI0WJr3Mz+Q2EA1mPHmExxd2KpdyjCvpKebFJA2Mx1ASbWzffnYvQa2NaLWMQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Zdhe+Bx3volnoZRlaLnkCg56S6VMwEi0rUC7K3si7+icakx6X28fpdZcAcDbNSEflbZU9FXuT0QBONn/Nuk2Hijrl/4kMvSpXG/VWs99mqAzEonpp5GuS6wktmfRiLWwi8zV6Fj8bHOS0zgOm2fSYACjO2VQ0ZSO0ptuMOfNWpceM7bzwrmmg5Rz62RijcuapbD1GPVuREiZOCtzozc9Y4clAC6F1m3ieAwF9/pKe0B6Pf55xVqtCZS13aRmaraiEAOkBeCx7aVtOprCn02qZyOMZ0tkpH2vaVwGvmezDFSz8ouoClHWPAI9FJbOaN4YPPWr4PLALuMK6ZlnH7Etaw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 24 Feb 2022 15:33:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.02.2022 15:53, Roger Pau Monné wrote:
> On Thu, Feb 24, 2022 at 01:58:31PM +0100, Jan Beulich wrote:
>> On 24.02.2022 13:43, Roger Pau Monne wrote:
>>> TBD: it's unclear whether we still need the pcidevs_lock in
>>> iommu_dev_iotlb_flush_timeout. The caller of
>>> iommu_dev_iotlb_flush_timeout is already bogus as it iterates over a
>>> list of pdevs without holding the pcidevs_lock.
>>
>> Analysis of whether / where recursive uses are needed should imo
>> include cases where the lock ought to be held, but currently isn't
>> (like apparently this case).
> 
> Well, I'm not opposed to that. I think just aiming to get the current
> usages analyzed will already be hard.

While I fully agree, the decision to drop recursive locking would better
not put roadblocks in the way of adding locking where it is currently
missing.

>>> @@ -1487,6 +1487,11 @@ static int assign_device(struct domain *d, u16 seg, 
>>> u8 bus, u8 devfn, u32 flag)
>>>      ASSERT(pdev && (pdev->domain == hardware_domain ||
>>>                      pdev->domain == dom_io));
>>>  
>>> +    /* Do not allow broken devices to be assigned. */
>>> +    rc = -EBADF;
>>> +    if ( pdev->broken )
>>> +        goto done;
>>
>> I think this wants exceptions for Dom0 and DomIO. An admin may be
>> able to fix things in Dom0, e.g. by updating device firmware.
> 
> Doesn't a device get assigned to DomIO (or Dom0 if not using quarantine
> mode), and then when deassigned from DomIO gets assigned to Dom0?
> 
> So there's no usage of assign_device in the path that (re)assigns a
> device used by a guest into Dom0?

Well, this assumes all tool stacks behave like the xl one presently
does. Which I think is the wrong way round: Making a device assignable
should be "deassign from Dom0" (implicitly making it land in DomIO),
while removing a device from the set of assignable ones should be
"assign to Dom0" (implicitly deassigning from DomIO). That way (I
think I said so elsewhere earlier on) libxl would also avoid the need
to actually use DOMID_IO explicitly. It using DOMID_SELF like done in
various other places would seem more clen to me.

Paul - I think it was you who decided to make it work the way it
currently does. Do you have any insights into your thought process
back then which you could share?

> Or would you rather imply that pdev->broken should get cleared at some
> point (ie: when the device is assigned back to Dom0)?

I did ponder this for a while when writing the earlier reply. But I
decided against, for it being a functional change: _pci_hide_device()
currently is also sticky. But yes, in principle if a misbehaving
device _can_ be fixed (e.g. by updating its firmware), then I think
there needs to be a way to clear the "broken" flag.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.