[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787

>>> On 10.06.13 at 15:53, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>
> On 6/10/2013 5:53 AM, Jan Beulich wrote:
>>>>> On 10.06.13 at 12:40, Tim Deegan <tim@xxxxxxx> wrote:
>>> At 10:47 +0100 on 10 Jun (1370861263), Jan Beulich wrote:
>>>>>>> On 10.06.13 at 11:35, Tim Deegan <tim@xxxxxxx> wrote:
>>>>> At 00:05 -0500 on 10 Jun (1370822751), suravee.suthikulpanit@xxxxxxx 
>>>>> wrote:
>>>>>> From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
>>>>>> The IOMMU interrupt handling in bottom half must clear the PPR log
>>> interrupt
>>>>>> and event log interrupt bits to re-enable the interrupt. This is done by
>>>>>> writing 1 to the memory mapped register to clear the bit. Due to hardware
>>>>> bug,
>>>>>> if the driver tries to clear this bit while the IOMMU hardware also 
>>>>>> setting
>>>>>> this bit, the conflict will result with the bit being set. If the 
>>>>>> interrupt
>>>>>> handling code does not make sure to clear this bit, subsequent changes in
>>>>> the
>>>>>> event/PPR logs will no longer generating interrupts, and would result if
>>>>>> buffer overflow. After clearing the bits, the driver must read back
>>>>>> the register to verify.
>>>>> Is there a risk of livelock here?  That is, if some device is causing a
>>>>> lot of IOMMU faults, a CPU could get stuck in this loop re-enabling
>>>>> interrupts as fast as they are raised.
>>>>> The solution suggested in the erratum seems better: if the bit is set
>>>>> after clearing, process the interrupts again (i.e. run/schedule the
>>>>> top-half handler).  That way the bottom-half handler will definitely
>>>>> terminate and the system can make some progress.
>>>> That's what's being done really: The actual interrupt handler disables
>>>> the interrupt sources, and the tasklet re-enables them (or at least is
>>>> supposed to do so - patch 1 isn't really correct in the respect).
>>> Oh I see, the idea is that we use the control register to mask
>>> interrupts instead of relying on the status register?  That seems
>>> better.  But doesn't this IOMMU already mask further interrupts when the
>>> pending bit in the status register is set?  I can't see any wording
>>> about that in the IOMMU doc but the erratum implies it.  Suravee, do you
>>> know if this is the case?
>> Actually, the documentation has a subtle but perhaps important
>> difference int the wording here: For EventLogInt and ComWaitInt
>> is read "An interrupt is generated when <status bit> = 1b and MMIO
>> Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt
>> it says "An interrupt is generated when <status bit> changes from 0b
>> to 1b and MMIO Offset 0018h[<control bit>] = 1b".
>> So I'd like to be one the safe side and assume further interrupts can
>> be generated in all cases - see also the emulation behavior in
>> iommu_guest.c which - afaict - always raises an interrupt, not just on
>> a 0 -> 1 transition.
> The behavior should be that the interrupt will be generated if the "xxxInt"
> bit is 0.  Once generated, it will be set to "1" by the hardware.  If this
> bit is 1, events will be added to the log but interrupt will not be 
> generated.

"Should be" isn't enough here, even more so given the mentioned
wording differences in your documentation. We need to know how
actual (past, current, and future) hardware behaves.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.