[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787
>>> On 10.06.13 at 17:11, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx> wrote: > On 6/10/2013 8:59 AM, Jan Beulich wrote: >>>>> On 10.06.13 at 15:53, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx> >> wrote: >>> On 6/10/2013 5:53 AM, Jan Beulich wrote: >>>>>>> On 10.06.13 at 12:40, Tim Deegan <tim@xxxxxxx> wrote: >>>>> At 10:47 +0100 on 10 Jun (1370861263), Jan Beulich wrote: >>>>>>>>> On 10.06.13 at 11:35, Tim Deegan <tim@xxxxxxx> wrote: >>>>>>> At 00:05 -0500 on 10 Jun (1370822751), suravee.suthikulpanit@xxxxxxx >>>>>>> wrote: >>>>>>>> From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> >>>>>>>> >>>>>>>> The IOMMU interrupt handling in bottom half must clear the PPR log >>>>> interrupt >>>>>>>> and event log interrupt bits to re-enable the interrupt. This is done >>>>>>>> by >>>>>>>> writing 1 to the memory mapped register to clear the bit. Due to >>>>>>>> hardware >>>>>>> bug, >>>>>>>> if the driver tries to clear this bit while the IOMMU hardware also >>>>>>>> setting >>>>>>>> this bit, the conflict will result with the bit being set. If the >>>>>>>> interrupt >>>>>>>> handling code does not make sure to clear this bit, subsequent changes >>>>>>>> in >>>>>>> the >>>>>>>> event/PPR logs will no longer generating interrupts, and would result >>>>>>>> if >>>>>>>> buffer overflow. After clearing the bits, the driver must read back >>>>>>>> the register to verify. >>>>>>> Is there a risk of livelock here? That is, if some device is causing a >>>>>>> lot of IOMMU faults, a CPU could get stuck in this loop re-enabling >>>>>>> interrupts as fast as they are raised. >>>>>>> >>>>>>> The solution suggested in the erratum seems better: if the bit is set >>>>>>> after clearing, process the interrupts again (i.e. run/schedule the >>>>>>> top-half handler). That way the bottom-half handler will definitely >>>>>>> terminate and the system can make some progress. >>>>>> That's what's being done really: The actual interrupt handler disables >>>>>> the interrupt sources, and the tasklet re-enables them (or at least is >>>>>> supposed to do so - patch 1 isn't really correct in the respect). >>>>> Oh I see, the idea is that we use the control register to mask >>>>> interrupts instead of relying on the status register? That seems >>>>> better. But doesn't this IOMMU already mask further interrupts when the >>>>> pending bit in the status register is set? I can't see any wording >>>>> about that in the IOMMU doc but the erratum implies it. Suravee, do you >>>>> know if this is the case? >>>> Actually, the documentation has a subtle but perhaps important >>>> difference int the wording here: For EventLogInt and ComWaitInt >>>> is read "An interrupt is generated when <status bit> = 1b and MMIO >>>> Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt >>>> it says "An interrupt is generated when <status bit> changes from 0b >>>> to 1b and MMIO Offset 0018h[<control bit>] = 1b". >>>> >>>> So I'd like to be one the safe side and assume further interrupts can >>>> be generated in all cases - see also the emulation behavior in >>>> iommu_guest.c which - afaict - always raises an interrupt, not just on >>>> a 0 -> 1 transition. >>> The behavior should be that the interrupt will be generated if the "xxxInt" >>> bit is 0. Once generated, it will be set to "1" by the hardware. If this >>> bit is 1, events will be added to the log but interrupt will not be >>> generated. >> "Should be" isn't enough here, even more so given the mentioned >> wording differences in your documentation. We need to know how >> actual (past, current, and future) hardware behaves. >> > I am sure this is what the behavior of the hardware. Besides, only the > hardware can set this bit. > I have also tested by not clearing the bit, and basically I did not see > additional interrupts. Which would allow us to simplify patch 1 quite a bit - there's then no need to clear the two interrupt-enable bits in the interrupt handler, and iommu_check_*_log() then also wouldn't need to set them again. We would just need to find the right point in time when to clear the corresponding status flag. And - as written in a response to Tim already - probably some parts of what we discussed for patch 2 really would need to move to patch 1. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |