[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] iommu: make no-quarantine mean no-quarantine


  • To: "paul@xxxxxxx" <paul@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • From: Scott Davis <scott.davis@xxxxxxxxxx>
  • Date: Thu, 29 Apr 2021 21:04:58 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=starlab.io; dmarc=pass action=none header.from=starlab.io; dkim=pass header.d=starlab.io; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2EFYX8tfPuM4fWgtWAa2+fUh+gll+yHoQFhgZvMWcHY=; b=il/UMDEuGIPqpYkQOVF9KHYkNaDxFIaaS3W7UdohTW1gnmUN4b3Yue7oqxWpjf18wPNoNIidanTnOrZP2tiFJwonjbYuAifb1MC2OArx5/QYFLY3QbJsaLS2HeQY621bEDFE5GChJOIeMlhk/9nrCP7IeHjMPD6RSUaEodb9SJcp1mO9lK8xY3Q20bP7TevTRUP6fkdnzZqP51DsTU8R2D7HErb6Tr+FIXkii9OccqZNZJ5OOYDNHsOIT18vflETyMzEUKVrDofXPe5xy2aMwwfbb3NQzq3ntLXUIfdzzTIfXUZ1z6SprxiO40B22MlRCvg/BBVOAIJ5MP0if8id/g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RUi/siLzaXcCWb8d7npST+6L1mcsP/LRY8JNjT7Y2cTbFuz3/G0tx3FEP+4fdM3sF3sDxbYXn+vfomIhbiime/sEm3B+n3b3vlz2GddC4w29gfiOD/EPo08jAuQGxsdYUP7qDg36HXMdLbsYFJl79k3oW2MPWrv8Sj7NIjDtJG39gJ+th240E7qE54LDphY2uFFnM2BEF6rVwyOTYasHv5Q4f1M5xrTP3vUvlEvJ8RuNFB9VUzko0pqVXGwOPFAHZ5EbHZo8FcJDlwt9MKSrbvd0dOz1ZTilV9bIt2OREt/gPwBjUnJ0YaZj8g46ky+JDaK4MBvZB58nvhGd0I1yuQ==
  • Authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=starlab.io;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 29 Apr 2021 21:05:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXOsEjPlXZorMQi0Kt8384oCpmdqrH72EAgAC5poCAAM1cAIAAEeQAgAI1xoA=
  • Thread-topic: [RFC PATCH] iommu: make no-quarantine mean no-quarantine

On 4/28/21, 3:20 AM, Paul Durrant wrote:
>> Following the extension to the command line option I'm putting in place
>> in "IOMMU: make DMA containment of quarantined devices optional" (which
>> I still need to get around to address review feedback for and resubmit),
>> I'd be inclined to suggest "iommu=quarantine=always" or
>> "iommu=quarantine=on-assign". Unless of course we'd prefer to have the
>> caller of the assignment operation have full control over the behavior
>> here anyway (in which case a command line option control simply is not
>> necessary).
>
> I'm still not entirely sure why not quarantining on is a problem, other
> than it triggering an as-yet undiagnosed issue in QEMU, but I agree that
> that the expectation of 'no-quarantine' meaning just that (i.e. the old
> dom0->domU and domU->dom0 transitions are re-instated) is reasonable. Do
> we really want yet more command line options?

Regarding the problem in QEMU, I traced the crash trigger down to a
write to the IQ tail register during the mapping operation into dom_io
(backtrace below). Along the way I noticed that, since a non-present
entry was being flushed, flush_context_qi only performs this
invalidation on an IOMMU with caching mode enabled (i.e. a software
IOMMU). Therefore this issue is probably only hittable when nesting.
Disabling caching mode on the QEMU vIOMMU was enough to prevent the
crash and give me a working system.

(gdb) si
0xffff82d04025b68b  72  in qinval.c
   0xffff82d04025b687 <qinval_update_qtail+43>: ... shl    $0x4,%r12
=> 0xffff82d04025b68b <qinval_update_qtail+47>: ... mov    %r12,0x88(%rax)
(gdb) bt
#0  0xffff82d04025b68b in qinval_update_qtail (...) at qinval.c:72
#1  0xffff82d04025baa7 in queue_invalidate_context_sync (...) at qinval.c:101
#2  flush_context_qi (...) at qinval.c:341
#3  0xffff82d040259125 in iommu_flush_context_device (...) at iommu.c:400
#4  domain_context_mapping_one (...) at iommu.c:1436
#5  0xffff82d040259351 in domain_context_mapping (...) at iommu.c:1510
#6  0xffff82d040259d20 in reassign_device_ownership (...) at iommu.c:2412
#7  0xffff82d040259f19 in intel_iommu_assign_device (...) at iommu.c:2476
#8  0xffff82d040267154 in assign_device (...) at pci.c:1545
#9  iommu_do_pci_domctl (...) at pci.c:1732
#10 0xffff82d040264de3 in iommu_do_domctl (...) at iommu.c:539
#11 0xffff82d040322ca5 in arch_do_domctl (...) at domctl.c:1496
#12 0xffff82d040205a19 in do_domctl (...) at domctl.c:956
#13 0xffff82d040319476 in pv_hypercall (...) at hypercall.c:155
#14 0xffff82d040390432 in lstar_enter () at entry.S:271
#15 0x0000000000000000 in ?? ()

As a result of the above, I no longer have a need to patch Xen to work
around the problem. Though I do want to test against newer versions of
QEMU (currently on 4.2.1) to see if it still exists.

So unless there's interest among Xen developers for this patch, I will
probably shelve it for now. Especially since it looks like Jan has some
ongoing work in this area that I had not previously discovered. If there
is interest, I just need a resolution on whether iommu=quarantine should
be left as a boolean or expanded to support always, never,
deassign-only, and (why not) assign-only.

Thanks,
Scott


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.