[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787



>>> On 10.06.13 at 14:43, Tim Deegan <tim@xxxxxxx> wrote:
> At 11:53 +0100 on 10 Jun (1370865209), Jan Beulich wrote:
>> > Oh I see, the idea is that we use the control register to mask
>> > interrupts instead of relying on the status register?  That seems
>> > better.  But doesn't this IOMMU already mask further interrupts when the
>> > pending bit in the status register is set?  I can't see any wording
>> > about that in the IOMMU doc but the erratum implies it.  Suravee, do you
>> > know if this is the case?
>> 
>> Actually, the documentation has a subtle but perhaps important
>> difference int the wording here: For EventLogInt and ComWaitInt
>> is read "An interrupt is generated when <status bit> = 1b and MMIO
>> Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt
>> it says "An interrupt is generated when <status bit> changes from 0b
>> to 1b and MMIO Offset 0018h[<control bit>] = 1b".
>> 
>> So I'd like to be one the safe side and assume further interrupts can
>> be generated in all cases.
> 
> Fair enough.
> 
>  - see also the emulation behavior in
>> iommu_guest.c which - afaict - always raises an interrupt, not just on
>> a 0 -> 1 transition.
> 
> Well, if it were a documented beahviour, we ought to change that. :)

I'll leave that to Suravee though, pending clarification on whether
the difference in wording is actually meaningful.

>> > In either case, the while () loop worries me; I think it would be better
>> > to schedule the tasklet again if we see the bit is set; a 'while
>> > (pending is set) { clear pending bit; }' loop might never exit the
>> > tasklet at all.
>> 
>> That could only be due to a hardware bug worse than the one we're
>> working around here, and I don't think is worth dealing with.
> 
> Well, there's runaway guest hardware.  If we reschedule the tasklet then
> pci_check_disable_device() will eventually catch and suppress the
> misbehaviour; if we spin here it won't get a chance.  I guess the
> argument is that it will eventually overflow the log buffer and stop
> setting the log-interrupt bit -- that needs at least a comment.
> 
> I was also a bit worried about the erratum-787 event getting delayed
> (since there's no interrupt to cause us to actually process it), but I
> just realised that's a more general problem: we ought to be resetting
> the 'pending' bits _before_ scanning the log, or any new entries that
> arrive between the log scan and the RW1C write won't be seen until the
> _next_ log entry causes an interrupt.
> 
> How about:
>  write-to-clear status.pending
>  process the log
>  if (status.pending)
>     reschedule the tasklet
>  else
>     unmask the interrupt.
> 
> Since we have to do the extra read anyway for erratum 787, we might as
> well save ourselves the bother of taking an interrupt in the other case.

Yes, that's a very reasonable approach. Will be a v4 then here soon
too, except that the locking there looks bogus too (and hence may
need fixing along the way)...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.