[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 16.08.11 at 17:06, Tim Deegan <tim@xxxxxxx> wrote:
> At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich wrote:
>> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xxxxxxx> wrote:
>> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xxxxxxx> wrote:
>> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> >> >> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
>> >> >> 
>> >> >> Do you really think this helps much? Direct control of the device means
>> >> >> it could also (perhaps on a second vCPU) constantly re-enable the bus
>> >> >> mastering bit. 
>> >> > 
>> >> > That path goes through qemu/pciback, so at least lets Xen schedule the
>> >> > dom0 tools.
>> >> 
>> >> Are you sure? If (as said) the guest uses a second vCPU for doing the
>> >> config space accesses, I can't see how this would save the pCPU the
>> >> fault storm is occurring on.
>> > 
>> > Hmmm.  Yes, I see what you mean.
>> Actually, a second vCPU may not even be needed: Since the "fault"
>> really is an external interrupt, if that one gets handled on a pCPU other
>> than the one the guest's vCPU is running on, it could execute such a
>> loop even in that case.
>> As to yesterdays softirq-based handling thoughts - perhaps the clearing
>> of the bus master bit on the device should still be done in the actual IRQ
>> handler, while the processing of the fault records could be moved out to
>> a softirq.
> Hmmm.  I like the idea of using a softirq but in fact by the time we've
> figured out which BDF to silence we've pretty much done handling the
> fault.

Ugly, but yes, indeed.

> Reading the VTd docs it looks like we can just ack the IOMMU fault
> interrupt and it won't send any more until we clear the log, so we can
> leave the whole business to a softirq.  Delaying that might cause the
> log to overflow, but that's not necessarily the end of the world.
> Looks like we can do the same on AMD by disabling interrupt generation
> in the main handler and reenabling it in the softirq.
> Is there any situation where we rally care terribly about the IOfault
> logs overflowing?

As long as older entries don't get overwritten, I don't think that's
going to be problematic. The more that we basically shut off the
offending device(s).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.