[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional



> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 10 March 2020 15:05
> To: paul@xxxxxxx
> Cc: 'Tian, Kevin' <kevin.tian@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; 
> 'Andrew Cooper'
> <andrew.cooper3@xxxxxxxxxx>
> Subject: Re: [PATCH v3] IOMMU: make DMA containment of quarantined devices 
> optional
> 
> On 10.03.2020 13:30, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@xxxxxxxx>
> >> Sent: 10 March 2020 10:27
> >> To: Tian, Kevin <kevin.tian@xxxxxxxxx>; Paul Durrant <paul@xxxxxxx>
> >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper 
> >> <andrew.cooper3@xxxxxxxxxx>
> >> Subject: Re: [PATCH v3] IOMMU: make DMA containment of quarantined devices 
> >> optional
> >>
> >> On 10.03.2020 04:43, Tian, Kevin wrote:
> >>>> From: Jan Beulich <jbeulich@xxxxxxxx>
> >>>> Sent: Monday, March 9, 2020 7:09 PM
> >>>>
> >>>> I'm happy to take better suggestions to replace the "full" command line
> >>>> option and Kconfig prompt tokens. I don't think though that "fault" and
> >>>> "write-fault" are really suitable there.
> >>>
> >>> I think we may just allow both r/w access to scratch page for such bogus
> >>> device, which may make 'full' more reasonable since we now fully
> >>> contain in-fly DMAs. I'm not sure about the value of keeping write-fault
> >>> alone for such devices (just because one observed his specific device only
> >>> has problem with read-fault).
> >>
> >> Well, a fundamental problem I have here is that I still don't know
> >> the _exact_ conditions for the observed hangs. I consider it unlikely
> >> for IOMMU read faults to cause hangs, but for write faults to be
> >> "fine".
> >
> > AFAIK it's because the writes are posted and so any faults are just 
> > ignored, whereas a read fault
> being synchronous causes the device's state machine to lock up. It really is 
> observed behaviour.
> >
> >> It would seem more likely to me that e.g. a non-present
> >> context entry might cause issues. If that was the case, we wouldn't
> >> need to handle reads and writes differently; we could instead install
> >> an all zero top level page table. And we'd still get all faults that
> >> are supposed to surface. But perhaps Paul did try this back then, and
> >> it turned out to not be an option.
> >>
> >
> > The only info I had was that faults on DMA reads had to avoided
> > completely. I did not have access to the h/w in question at the
> > time. I may be able to get it now.
> 
> I see. The implication then is, as Kevin said, that we mustn't run
> guests with _any_ IOMMU PTEs having their "read" bits clear.
> Anything that's "not present" now would need directing to a scratch
> page. I then further wonder what effect reads to addresses beyond
> AGAW would have. It may be impossible to arrange for sufficiently
> secure pass-through with such a device, at which point - again as
> said by Kevin - there may be little point in the scratch page
> based quarantining.
> 

Well, I can't say there's little point in it as it does fix a host lock-up.

You say "as Kevin said, that we mustn't run guests with _any_ IOMMU PTEs having 
their "read" bits clear"... I can't find that. I did find where he said "In 
concept any IOMMU page table (dom0, dom_io or domU) for such bogus device 
should not include invalid entry", but that's a different thing. However, is a 
really saying that things will break if any of the PTEs has their present bit 
clear?

  Paul



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.