Re: [Xen-devel] [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present

On 05/06/13 10:54, Jan Beulich wrote:
>>>> On 05.06.13 at 11:43, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 05/06/13 09:20, Jan Beulich wrote:
>>> But anyway - I continue to be unconvinced that this case can't be
>>> easily enough dealt with the admin adding "ioapic_ack=old" to the
>>> command line.
>> That describes half the errata workarounds we do.  Why does this deserve
>> different treatment?.
> Because (and I don't think you said anything to the contrary so far)
> other than those other workarounds, this one only addresses a
> portion of the effects of said erratum.

I am working on the knowledge from the customer that using
io_apic_ack=old does fix all their unstable behaviour.  I suppose this
does not guarantee that the fix is good, but is certainly a good indication.

From the errata documentation:

The following interrupt and virtual wire message register settings
should not be changed in such a manner that
would cause an active interrupt or virtual wire message source to go
from unmasked to masked without first
quiescing the interrupts and/or virtual wire messages they affect in the

Dev[B,A]:0x48, bit 14 [INTx_PACKET_EN]
Dev[B,A]:1x04, bit 2 [MASEN]
Dev[B,A]:1x44, bit 1 [IOAEN]
Dev[B,A]:1x44, bit 0 [OSVISBAR]
Any IOAPIC Redirection Register, bit 16, Interrupt Mask [IM]
Any IOAPIC Redirection Register, bit 15, Trigger Mode [TM]
Any IOAPIC Redirection Register, bit 13, Polarity [POL]

Without a hypertransport specification to hand I cant really comment
about 1,3,4 other than expecting that Xen does not no nor care about
them.  Xen might possibly play with the busmaster bit, but without an
IOMMU on the affected system, I cant spot any codepaths which would.  It
does occur to me however that these devices are more candidates for
"stuff not even dom0 should be permitted to play with"

After boot, I am not aware of anything which would play with the
poliarity bit of an RTE.  The Trigger Mode bit might be played with if a
line level interrupt gets delivered as an edge interrupt.  As I
remember, this was a bugfix workaround for ancient IO-APICs anyway and
would not normally be expected to happen.

The only bit which is regularly played with by xen is the mask bit, but
only in the default case of io_apic_ack=new.


>> It can certainly can be worked around by using io_apci_ack=old on the
>> command line, and that is how we verified the 'fix'.  But in the
>> meantime it took a normally technically-savvy customer 2 months of time
>> in highest level support (i.e. my colleagues and I) before we got to the
>> bottom of the issue.
> I appreciate you having found the root cause.
> Jan

