[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 19 Nov 2020 15:19:15 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+xYD+6yimUIh0rI6DWtLIt6ifDqPKnRFCVLT5EUL8Og=; b=lbFryb9YNSBcYhHcGyH4O5gPWJGeh+8fp9XPKzR7eXLyYxpVQ3B+jkZ9KW71x7J2PMc3lbvNLCIl5HC6Tr/yy3l0OFHlpPeYbto/e+eA7qlchBzcQwD2zWSJYhQfdNNOrettL8s8pxoFqjEeeuNvv2TE9++SBCFt3cNbbjT9bx2lyy24DWae8s3krksPdX/mdYPeJRzXIyMMtFxYrOnhLsJ6u+CAxE0ZooWoblgPYImby9xe4lS7eU59PfkY8P03MpT8OiGW6isdz1HHY56Ha4OiqELu9xx0vslXyLZOJK4f5j8byDA0unn96F78X7JUuxwCAf/rDc1uU9DBguwDRw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jSzO1OPx//NLGPBxIaBNIPRUG1lOfsf6boG6572w9qFVksYvaaKWdoBz2/idG0KGUOC6v2F2+amcZZuZZQyAfnoE9BQtIq2qOopCiTrOv0vTSGklJF06MTjGfVhNRNEL6TqQIHZFttG860uYFCrx1mKrVx++PhyGja/739S2fRU0CRoGBGOgBkYdexHrfqB6U2pU0ENwK/yEQKsqI97LWVM8UlqfIbuBLVkb0fBJEPsIhEpxP0qXwVFpOZ/mJjADXWtb0o2K40pseNSJYvMZjTKPC7WhyfQw66BpJ7FUJNnTP4Cjlmpj2ZQV6d7r2GApvSK1AkMmASDwSkuBuDtU8Q==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 19 Nov 2020 14:20:34 +0000
  • Ironport-sdr: DQVBbvRaYe1CWwjHi59DscQm+PuppxJDwMMPxYAw8h8DHxsKhgPNs0M1iJuabmZXROJotgAY4v lyRBCWDTGBzyzCQf2MEzQUuZJwVAHDWR2KSyR+d7a2M/HPcwNRBRTd9zfS1sWsHq2mdQ10v1n9 HV6k79ySwP5esofYZitwBPCBNHDw0aEybaI3IRBoAb96o2GA4Tz5nzG8dBs/qI/SBeADYEHjBk ns1yAVHe6Ym94qkxkG7kEonoEWfezOFq1bGaqIG0Orvd+dw1878NYuIlqVhfXzC65BQxq9n/sL XMQ=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Nov 18, 2020 at 03:59:44PM +0100, Jan Beulich wrote:
> On 18.11.2020 15:39, Roger Pau Monné wrote:
> > On Wed, Nov 18, 2020 at 01:14:03PM +0100, Manuel Bouyer wrote:
> >> I did some more instrumentation from the NetBSD kernel, including dumping
> >> the iopic2 pin2 register.
> >>
> >> At the time of the command timeout, the register value is 0x0000a067,
> >> which, if I understant it properly, menas that there's no interrupt
> >> pending (bit IOAPIC_REDLO_RIRR, 0x00004000, is not set).
> >> From the NetBSD ddb, I can dump this register multiple times, waiting
> >> several seconds, etc .., it doens't change).
> >> Now if I call ioapic_dump_raw() from the debugger, which triggers some
> >> XEN printf:
> >> db{0}> call ioapic_dump_raw^M
> >> Register dump of ioapic0^M
> >> [ 203.5489060] 00 08000000 00170011 08000000(XEN) vioapic.c:124:d0v0 
> >> apic_mem_re
> >> adl:undefined ioregsel 3
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
> >>  00000000^M
> >> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined 
> >> ioregsel 8
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
> >>  00000000^M
> >> [ 203.5489060] 10 00010000 00000000 00010000 00000000 00010000 00000000 
> >> 00010000 00000000^M
> >> [...]
> >> [ 203.5489060] Register dump of ioapic2^M
> >> [ 203.5489060] 00 0a000000 00070011 0a000000(XEN) vioapic.c:124:d0v0 
> >> apic_mem_readl:undefined ioregsel 3
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
> >>  00000000^M
> >> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined 
> >> ioregsel 8
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
> >>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
> >>  00000000^M
> >> [ 203.5489060] 10 00010000 00000000 00010000 00000000 0000e067 00000000 
> >> 00010000 00000000^M
> >>
> >> then the register switches to 0000e067, with the IOAPIC_REDLO_RIRR bit set.
> >> From here, if I continue from ddb, the dom0 boots.
> >>
> >> I can get the same effect by just doing ^A^A^A so my guess is that it's
> >> not accessing the iopic's register which changes the IOAPIC_REDLO_RIRR bit,
> >> but the XEN printf. Also, from NetBSD, using a dump fuinction which
> >> doesn't access undefined registers - and so doesn't trigger XEN printfs -
> >> doens't change the IOAPIC_REDLO_RIRR bit either.
> > 
> > I'm thinking about further ways to debug this. I see that all active
> > IO-APIC pins are routed to vCPU0, but does it make a difference if you
> > boot with dom0_max_vcpus=1 on the Xen command line? (thus limiting
> > NertBSD dom0 to a single CPU)
> 
> I too have been pondering possible approaches. One thing I thought might
> help is accompany all places setting remote_irr (and calling
> vioapic_deliver()) with a conditional log message, turning on the
> condition immediately before the first "undefined ioregsel" gets logged.
> (And turn it off again once the last RTE was read in sequence, just to
> avoid spamming the console.) From Manuel's description above, there has
> to be something that sets the bit and causes the delivery _without_ any
> active action by the guest (i.e. neither EOI nor RTE write) and
> _without_ any new instance of the IRQ appearing. I have some vague hope
> that knowing how we end up making the system make progress again may
> also help understand how it got stuck.

I've got two different debug patches for you to try. I'm attaching both
to this email but I think we should start with Jan's suggestion
(conditional_print.patch). That patch will only print extra messages
between the ioregsel 3 ... ioregsel f existing debug messages, you
will have to trigger this from NetBSD by using ioapic_dump_raw AFAICT.

The other patch (verbose_intr.patch) adds some messages related to
interrupts, but I'm afraid it's likely to interfere with the issue we
are trying to debug, since it will add a lot of extra printk's (and
likely flood your console).

Thanks, Roger.

Attachment: conditional_print.patch
Description: Text document

Attachment: verbose_intr.patch
Description: Text document


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.