[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional
> -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: 10 March 2020 10:27 > To: Tian, Kevin <kevin.tian@xxxxxxxxx>; Paul Durrant <paul@xxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Subject: Re: [PATCH v3] IOMMU: make DMA containment of quarantined devices > optional > > On 10.03.2020 04:43, Tian, Kevin wrote: > >> From: Jan Beulich <jbeulich@xxxxxxxx> > >> Sent: Monday, March 9, 2020 7:09 PM > >> > >> I'm happy to take better suggestions to replace the "full" command line > >> option and Kconfig prompt tokens. I don't think though that "fault" and > >> "write-fault" are really suitable there. > > > > I think we may just allow both r/w access to scratch page for such bogus > > device, which may make 'full' more reasonable since we now fully > > contain in-fly DMAs. I'm not sure about the value of keeping write-fault > > alone for such devices (just because one observed his specific device only > > has problem with read-fault). > > Well, a fundamental problem I have here is that I still don't know > the _exact_ conditions for the observed hangs. I consider it unlikely > for IOMMU read faults to cause hangs, but for write faults to be > "fine". AFAIK it's because the writes are posted and so any faults are just ignored, whereas a read fault being synchronous causes the device's state machine to lock up. It really is observed behaviour. > It would seem more likely to me that e.g. a non-present > context entry might cause issues. If that was the case, we wouldn't > need to handle reads and writes differently; we could instead install > an all zero top level page table. And we'd still get all faults that > are supposed to surface. But perhaps Paul did try this back then, and > it turned out to not be an option. > The only info I had was that faults on DMA reads had to avoided completely. I did not have access to the h/w in question at the time. I may be able to get it now. Paul > The choice of letting writes continue to fault was based on (a) this > having been tested to work on the affected system(s) and (b) also > letting writes go to a scratch page requiring a per-device scratch > page (and associated page tables) rather than a system-wide one, as > devices coming from different domains would otherwise be able to > observe data written to memory by respectively "foreign" devices > (and hence domains). > > But this is all guesswork without the firmware writers of affected > systems giving us at least some hints. > > > alternatively I also thought about whether whitelisting the problematic > > devices through another option (e.g. nofault=b:d:f) could provide more > > value. In concept any IOMMU page table (dom0, dom_io or domU) > > for such bogus device should not include invalid entry, even when > > quarantine is not specified. However I'm not sure whether it's worthy of > > going so far... > > Indeed. Question though is whether this bad behavior is device specific > (rather than e.g. system dependent). Plus - as per above - question > also is whether it's really leaf (or intermediate) page table entry > presence which actually matters here. If it was, I agree we shouldn't > have any non-present entries anywhere in the page table trees. > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |