[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls



On 27.11.20 14:40, Jan Beulich wrote:
On 27.11.2020 14:31, Manuel Bouyer wrote:
On Fri, Nov 27, 2020 at 02:18:54PM +0100, Jan Beulich wrote:
On 27.11.2020 14:13, Manuel Bouyer wrote:
On Fri, Nov 27, 2020 at 12:29:35PM +0100, Jan Beulich wrote:
On 27.11.2020 11:59, Roger Pau Monné wrote:
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -187,6 +187,10 @@ void hvm_gsi_assert(struct domain *d, unsigned int gsi)
       * to know if the GSI is pending or not.
       */
      spin_lock(&d->arch.hvm.irq_lock);
+    if ( gsi == TRACK_IRQ )
+        debugtrace_printk("hvm_gsi_assert irq %u trig %u assert count %u\n",
+                          gsi, trig, hvm_irq->gsi_assert_count[gsi]);

This produces

81961 hvm_gsi_assert irq 34 trig 1 assert count 1

Since the logging occurs ahead of the call to assert_gsi(), it
means we don't signal anything to Dom0, because according to our
records there's still an IRQ in flight. Unfortunately we only
see the tail of the trace, so it's not possible to tell how / when
we got into this state.

Manuel - is this the only patch you have in place? Or did you keep
any prior ones? Iirc there once was one where Roger also suppressed
some de-assert call.

Yes, I have some of the previous patches (otherwise Xen panics).
Attached is the diffs I currently have

I think you want to delete the hunk dropping the call to
hvm_gsi_deassert() from pt_irq_time_out(). Iirc it was that
addition which changed the behavior to just a single IRQ ever
making it into Dom0. And it ought to be only the change to
msix_write() which is needed to avoid the panic.

yes, I did keep the hvm_gsi_deassert() patch because I expected it
to make things easier, as it allows to interract with Xen without changing
interrupt states.

Right, but then we'd need to see the beginning of the trace,
rather than it starting at (in this case) about 95,000. Yet ...

I removed it, here's a new trace

http://www-soc.lip6.fr/~bouyer/xen-log12.txt

... hmm, odd - no change at all:

95572 hvm_gsi_assert irq 34 trig 1 assert count 1

I was sort of expecting that this might be where we fail to
set the assert count back to zero. Will need further
thinking, if nothing else than how to turn down the verbosity
without hiding crucial information. Or maybe Roger has got
some idea ...

Set debugtrace buffer size to something huge?

Panic when the buffer is full?

It should be noted that the debugtrace in being printed in case of a
panic.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.