[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NetBSD dom0 PVH: hardware interrupts stalls


  • To: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 23 Nov 2020 10:57:13 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hwAq/saEr/OrWnzPV+cn/1xvxOm5C3fwrhPvbNNontc=; b=ALTrKs7GhprmMCnQp7QdDI9hlGnd6nZksWbr0qjNhTvdLIAhvUWKhajfxPslyRsYIugfJDcXlxYDWx6hvOD6GLNDuJc3pJYzELP7iazJpEkq/uhqh0D4SNPe6hCwZSPfZl0lbj/bDEmJ3g/chWqCEWLXzTtmfCE9PA+Wt5rPc5AofKO4Mse5N8SAcD7ea2N6rIwBcnCYFGRCq1BeoxnenhwLG8fuxY4O0SMlMQWiS8KSvnwGaBlkkDkvxzDqSRD7WB9mmZoLdK6XG0KPyupjRaP/K9DXfxirs2EyMWYTB8CcRYF3F9Z5ilaWKDL9mCqlSdP7xdGxbwzF8a0NdvGGzg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UiJkjNlbv5isphXqBD6pLOjy3J4UT5FXjV2MpNGgp3ysB2Up0ErUFmDtFU3VfPMef0k/Y57tTurcLwlVdP3Tp7G6+CkYlsLA4Du9SwFWw/jQAMYFd0NZq1YRa88YZfrUPr4yKrXQc4P0boiGDpVgbmTDWjkb2dG9zzQLD42MRZ3MPccACKc1WFgkw5EI2rr+0oKX9mnfERAkVXQYL6+UyvMBPPO0cd2/EcxFeWXn5arZQyN81eNJEYgfn3Mj9wfEesT1Z5lRZVb6amTXTfqoFXmkNyvLrTnkAcTNvw+6Vi/132TGjjBmCFCnNOzyIbVFmOx2E5xehSXCaZLcl8TBbw==
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 23 Nov 2020 09:57:46 +0000
  • Ironport-sdr: +Qc82t5JRKzqzJRCJ+u7kUzebj3VM7I8RacvfF4bkSQ74mRFVrlLClbz/CARfa51WAXdEEc/PZ /QbEVxs0cJ7LfqWG8H6FfDidq7TIBTV6O4mSWuALF3ngfJOWoQhRUFbh76epx09iTcl5c8i3mq IFvmm+K1Dyz4kEG2bs92G2M9mGmMwI0UGneWA3VEO39R8Z4bQyKDxZh/+S8QL6EMIHRXzZVqHQ sDhkqpFndOwTH/c7sVw5Q3GEDYXiquHI3BCMBdBU+B/CU0pdLcrk/lcuM9RKWxZfoNGzktSb3Z Yeo=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Nov 20, 2020 at 11:38:24AM +0100, Manuel Bouyer wrote:
> On Fri, Nov 20, 2020 at 11:00:05AM +0100, Jan Beulich wrote:
> > On 20.11.2020 10:27, Manuel Bouyer wrote:
> > > On Fri, Nov 20, 2020 at 09:59:57AM +0100, Jan Beulich wrote:
> > >> Well, anything coming through the LAPIC needs ack-ing (except for
> > >> the spurious interrupt of course), or else ISR won't get updated
> > >> and further interrupts at this or lower priority can't be serviced
> > >> (delivered) anymore. This includes interrupts originally coming
> > >> through the IO-APIC. But the same constraint / requirement exists
> > >> on baremetal.
> > > 
> > > OK, so even if I didn't see where this happens, it's happening.
> > > Is it what's Xen is using as ACK from the dom0 for a IOAPIC
> > > interrupt, or is it something else (at the IOAPIC level) ?
> > 
> > It's the traditional LAPIC based EOI mechanism that Xen intercepts
> > (as necessary) on the guest side and then translates (via
> > surprisingly many layers of calls) into the necessary EOI /
> > unmask / whatever at the hardware level. Our vIO-APIC
> > implementation so far doesn't support IO-APIC based EOI at all
> > (which is reflected in the IO-APIC version ID being 0x11).
> 
> OK.
> I finally found where the EOI occurs (it's within a macro so s simple grep
> didn't show it).
> 
> When interrupts are not masked (e.g. via cli), the ioapic halder is called.
> From here, 2 paths can happen:
> a) the software interrupt priority level (called spl in BSD world) allows the
>   driver's handler to run. In this case it's called, then the interrupt
>   is unmasked at ioapic level, and EOI'd at lapic.
> b) the software interrupt priority level doesn't allow this driver's handler 
> to
>   run. In this case, the interrupt is marked as pending in software,
>   explicitely masked at the iopic level and EOI'd at lapic.
>   Later, when the spl is lowered, the driver's interrupt handler is run,
>   then the interrupt is unmasked at ioapic level, and EOI'd at lapic
>   (this is the same path as a)). here we may EOI the lapic twice, and the
>   second time when there's no hardware interrupt pending any more.
> 
> I suspect it's case b) which causes the problem with Xen.

Case b) should be handled fine AFAICT. If there's no interrupt pending
in the lapic ISR the EOI is just a noop. Iff there's somehow another
vector pending in ISR you might actually be EOIing the wrong vector,
and thus this would be a bug in NetBSD. I certainly don't know much of
NetBSD interrupt model in order to know whether this second EOI is just
not necessary or whether it could cause issues.

Can you actually assert that disabling this second unneeded EOI does
solve the problem?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.