[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 18 Nov 2020 15:59:44 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YTRar5Ln38uVXsmBvVyLeFY2UisNGPtpkLSVX7oZtx0=; b=N6N9QX+4UOqMa7UPjjJiZillS+XEZutwZmLBCezdtK/WxXWujFZ4kBgu8Ne7N6RiD2f96qdyegNd1yEsn2GMZiKoTiO8bjsVwpxnZMFHGQ/EMSTHYZHrXoCbdyEXy4D7kYvsQxKIyZR3Bjv+2ITKH6IJBMsY7j2yZtK8KHSrvNfvPrmgWW+eE6Qpb6O6I8g3tYl7DTQ55HTac/U9+Fxge4jBjEfv+ax+wtJ1bidAq0jV2wafp77g+xTylU+NQX3E9kgLxh59ZcaBK9Vv/tGngQMAKf+VNtl5nhTWLdETJ3czKON3MJlPYfXwalA66esTywcklhOE1Yfq0xpnyiZ4vA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kz7O+S5I6B6DrLJVEKstklsqcBLZ1OEUE5+5XHm3V2P4vnqiWszrEyuloxRqgaADXVEldEXCQ8OKO8za+NcZRxbJ1ajWiPIuv/hFAwRpNpO9D3e01fpFtf2QCx6RqIjoR1VQpQ4Ggw/roQ1u+/gv1r7y74D4pe4FONb5+scsX2T1/fyXc9n4b363fYsXrknWR2P59IOgQdFE2GVbLqwndJhnbZ9OFv51MjYwiP4si7dPBLrsSh7u1ovxIYpCEbP8wXpYMx6AStiPfJ1pxwbVEaojWGeI+HUCL95ZNwJueCAQ74ih1YIT/UlR+/IyTuXIBWdqIcwV8/TAwcfZhj3ngw==
  • Authentication-results: antioche.eu.org; dkim=none (message not signed) header.d=none;antioche.eu.org; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 18 Nov 2020 15:00:14 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.11.2020 15:39, Roger Pau Monné wrote:
> On Wed, Nov 18, 2020 at 01:14:03PM +0100, Manuel Bouyer wrote:
>> I did some more instrumentation from the NetBSD kernel, including dumping
>> the iopic2 pin2 register.
>>
>> At the time of the command timeout, the register value is 0x0000a067,
>> which, if I understant it properly, menas that there's no interrupt
>> pending (bit IOAPIC_REDLO_RIRR, 0x00004000, is not set).
>> From the NetBSD ddb, I can dump this register multiple times, waiting
>> several seconds, etc .., it doens't change).
>> Now if I call ioapic_dump_raw() from the debugger, which triggers some
>> XEN printf:
>> db{0}> call ioapic_dump_raw^M
>> Register dump of ioapic0^M
>> [ 203.5489060] 00 08000000 00170011 08000000(XEN) vioapic.c:124:d0v0 
>> apic_mem_re
>> adl:undefined ioregsel 3
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
>>  00000000^M
>> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
>>  00000000^M
>> [ 203.5489060] 10 00010000 00000000 00010000 00000000 00010000 00000000 
>> 00010000 00000000^M
>> [...]
>> [ 203.5489060] Register dump of ioapic2^M
>> [ 203.5489060] 00 0a000000 00070011 0a000000(XEN) vioapic.c:124:d0v0 
>> apic_mem_readl:undefined ioregsel 3
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 4
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 5
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 6
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 7
>>  00000000^M
>> [ 203.5489060] 08(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 8
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel 9
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel a
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel b
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel c
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel d
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel e
>>  00000000(XEN) vioapic.c:124:d0v0 apic_mem_readl:undefined ioregsel f
>>  00000000^M
>> [ 203.5489060] 10 00010000 00000000 00010000 00000000 0000e067 00000000 
>> 00010000 00000000^M
>>
>> then the register switches to 0000e067, with the IOAPIC_REDLO_RIRR bit set.
>> From here, if I continue from ddb, the dom0 boots.
>>
>> I can get the same effect by just doing ^A^A^A so my guess is that it's
>> not accessing the iopic's register which changes the IOAPIC_REDLO_RIRR bit,
>> but the XEN printf. Also, from NetBSD, using a dump fuinction which
>> doesn't access undefined registers - and so doesn't trigger XEN printfs -
>> doens't change the IOAPIC_REDLO_RIRR bit either.
> 
> I'm thinking about further ways to debug this. I see that all active
> IO-APIC pins are routed to vCPU0, but does it make a difference if you
> boot with dom0_max_vcpus=1 on the Xen command line? (thus limiting
> NertBSD dom0 to a single CPU)

I too have been pondering possible approaches. One thing I thought might
help is accompany all places setting remote_irr (and calling
vioapic_deliver()) with a conditional log message, turning on the
condition immediately before the first "undefined ioregsel" gets logged.
(And turn it off again once the last RTE was read in sequence, just to
avoid spamming the console.) From Manuel's description above, there has
to be something that sets the bit and causes the delivery _without_ any
active action by the guest (i.e. neither EOI nor RTE write) and
_without_ any new instance of the IRQ appearing. I have some vague hope
that knowing how we end up making the system make progress again may
also help understand how it got stuck.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.