[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable] Commit 2ca9fbd739b8a72b16dd790d0fff7b75f5488fb8 AMD IOMMU: allocate IRTE entries instead of using a static mapping, makes dom0 boot process stall several times.



Friday, August 16, 2013, 3:15:50 PM, you wrote:

>>>> On 16.08.13 at 12:44, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
>> Friday, August 16, 2013, 11:18:56 AM, you wrote:
>>>>>> On 16.08.13 at 10:40, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
>>>> Hmm only the "no-cpuidle" is needed (cpufreq=xen can stay) to make the 
>>>> stalls 
>>>> disappear,
>>>> but makes me wonder how that is related to the commit the bisection found 
>>>> ..
>>>> machine has been running with cpuidle enabled for ages ..
>> 
>>> That's odd indeed. If you're up to do a little bit of debugging here,
>>> why don't you log the sequence of interrupts arriving both with
>>> and without said commit. This might end up being a lot of data, so
>>> you may want to filter out uninteresting stuff and/or log only to
>>> a memory buffer which then gets dumped upon some debug key
>>> press.
>> 
>> Hmm making said debug patch is getting probably a bit out of my league ..
>> since the generated interrupts will probably outpace flushing to the 
>> console.
>> 
>> And i'm not sure in what things you are actually interested around the irq 
>> flow (probably the hpet msi ones ?).

> No, much more the ones from the devices that you say the drivers
> of which cause the stalls while initializing. The question mainly is
> whether the distribution of interrupts between CPUs changed in a
> way that made the system more susceptible to missing wakeups
> via HPET MSIs.

> Along that lines was also the question regarding interrupt counts
> for the devices in question, which if I'm not mistaken you didn't
> answer yet.

Hi Jan,

Got things running again, first on baremetal linux
I was having 2 seperate problems it seems:

1) the southbridge ioapic(6) isn't in the IVRS tables, the Linux kernel has had 
a patch
   "iommu/amd: Add ioapic and hpet ivrs override" 
(https://lists.linux-foundation.org/pipermail/iommu/2013-April/005506.html)

   That patch makes it possible to override the incorrect IVRS tables on the 
command line.
   Using ivrs_ioapic[6]=00:14.0 ivrs_ioapic[7]=00:00.1 ivrs_hpet[0]=00:14.0 
made it boot correctly and enable the iommu and interrupt remapping.

   That patch would probably be a good candidate to port to Xen too, 
considering the iommu massacre on at least boards with a 890fx chipset.
   (or make it a quirk for these earlier chipsets, since the BDF for 
northbridge and southridge ioapic and hpet seem to be known fixed values
   from what i read from earlier mailinglist and code comments, but perhaps 
Suravee could comment on that ... he seems to have tested the patch for Linux 
as well
   (https://lists.linux-foundation.org/pipermail/iommu/2013-April/005528.html))

2) After that my sata controller gave read errors, but i found it was still in 
"ide" mode instead of "ahci" in the bios. A seperate but perhaps related issue
   (probably something with the enabling of multiple msi's which the driver can 
not handle in "ide" mode, will sort that out later.)


After i got things working on baremetal Linux, i adjusted Xen and hardcoded it 
to add a mapping for ioapic[6]=00:14.0.
(the entries for ivrs_ioapic[7] and hpet[0] are actually correct in the bios 
tables, so they don't need correction for me at the moment)

And hey presto .. no stalls ..

So i think porting the override patch to Xen (or make it a quirk and ignore the 
IVRS table for special devices on certain chipsets) could solve a lot of the
reported iommu problems for AMD systems.

--
Sander

> Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.