[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Serious AMD-Vi(?) issue



On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
> On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> > 
> > I suspect this is a case of there is some step which is missing from
> > Xen's IOMMU handling.  Perhaps something which Linux does during an early
> > DMA setup stage, but the current Xen implementation does lazily?
> > Alternatively some flag setting or missing step?
> > 
> > I should be able to do another test approach in a few weeks, but I would
> > love if something could be found sooner.
> 
> Turned out to be disturbingly easy to get the first entry when it
> happened.  Didn't even need `dbench`, it simply showed once the OS was
> fully loaded.  I did get some additional data points.
> 
> Appears this requires an AMD IOMMUv2.  A test system with known
> functioning AMD IOMMUv1 didn't display the issue at all.
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr fffffffdf8000000 flags 0x8 I

I would expect the address field to contain more information about the
fault, but I'm not finding any information on the AMD-Vi specification
apart from that it contains the DVA, which makes no sense when the
fault is caused by an interrupt.

> (XEN) DDDD:bb:dd.f root @ 83b5f5 (3 levels) dfn=fffffffdf8000
> (XEN)   L3[1f7] = 0 np

Attempting to print the page table walk for an Interrupt remapping
fault is useless, we should likely avoid that when the I flag is set.

> 
> I find it surprising this required "iommu=debug" to get this level of
> detail.  This amount of output seems more appropriate for "verbose".

"verbose" should also print this information.

> 
> I strongly prefer to provide snippets.  There is a fair bit of output,
> I'm unsure which portion is most pertinent.

I've already voiced my concern that I think what yo uare doing is not
fair.  We are debugging this out of interest, and hence you refusing
to provide all information just hampers our ability to debug, and
makes us spend more time than required just thinking what snippets we
need to ask for.

I will ask again, what's there in the Xen or the Linux dmesgs that you
are so worried about leaking? Please provide an specific example.

Why do you mask the device SBDF in the above snippet?  I would really
like to understand what's so privacy relevant in a PCI SBDF number.

Does booting with `iommu=no-intremap` lead to any issues being
reported?

Regards, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.