[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] IO APIC interrupt stuck with irr=1 (was: Re: [Xen-users] xen hypervisor does not like my Dom0 LVM partition: I/O Errors)





On Thu, Dec 1, 2016 at 7:21 AM Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 01.12.16 at 11:29, <roger.pau@xxxxxxxxxx> wrote:
>> (XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
>> (XEN) [VT-D]  RMRR address range bf7da000..bf7d9fff not in reserved memory;
>> need "iommu_inclusive_mapping=1"?
>> (XEN) [VT-D]  RMRR (bf7da000, bf7d9fff) is incorrect
>> (XEN) Failed to parse ACPI DMAR.  Disabling VT-d.

Do things work better with this worked around (as suggested by the
message)?

>> (XEN)     IRQ 20 Vec 41:
>> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=1
>> polarity=1 irr=1 trig=L mask=0 dest_id:8
>
> So this IO APIC vector seems to be stuck with irr=1, I've assumed that Xen would
> ack the interrupt if a certain timeout has passed and the guest has not done it,
> but I could be mistaken.

Interrupts in IRR can't be acked, they first need to propagate to
ISR (in LAPIC terms). Hence we'd need to know the state of the
corresponding ISR bit (and for completeness also the IRR one) in
the LAPIC. I would suspect that the ISR bit is also set, and _that_
would then indicate we may have missed issuing an EOI. But it
might also be that some interrupt at a higher priority (larger
vector number) is not disappearing from ISR, effectively masking
the relatively low numbered vector here.

Also, are you positive about the IRR bit here being _permanently_
set, rather than just at the point this one sample was taken?

> I've also seen similar issues on some boxes, this seems
> to always happen on boxes with more than one IO APIC. In the past I've solved it
> by setting ioapic_ack=old, but that doesn't seem to work for his case.

Not so here (for the last so many years), so I wonder whether it
matters what Dom0 kernel is in use.

>> And here are the messages to prove there was a lost interrupt:
>>
>> 11/30/16 5:09 PM    jaga-Desktop    kernel    [10056.569371] ata2: lost
>> interrupt (Status 0x58)
>> 11/30/16 5:09 PM    jaga-Desktop    kernel    [10056.569402] ata3: lost
>> interrupt (Status 0x58)
>> 11/30/16 6:00 PM    jaga-Desktop    kernel    [    0.187813] DMAR-IR: This
>> system BIOS has enabled interrupt remapping
>> 11/30/16 6:00 PM    jaga-Desktop    kernel    [    0.187813] interrupt
>> remapping is being disabled.  Please

These two messages are suspicious: The kernel should keep its hands
off any IOMMU things when running under Xen.

Jan

iommu_inclusive_mapping=1 does not help. Booting to a different kernel makes no difference. 

Jeff   
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.