[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event delivery and "domain blocking" on PVHv2


  • To: Martin Lucina <martin@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 19 Jun 2020 19:42:13 +0200
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, mirageos-devel@xxxxxxxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 19 Jun 2020 17:42:31 +0000
  • Ironport-sdr: HM6kqYKEN36aCLijVPw3284hmcR2Na5SOQ+OAAXLJbsd9Nic+H94ffrxC5fn7opoNSiRrdq/cN rNob5PWveZm6aYImuegajhC2XkEK8nXcubPXaXLxbbTVRKDWjnZR7STIHIsAA+ZP3trP92Qj56 XHb7l8yy4YgO8pU1HNK/B1WGA5KUdaA1w0cvOGjCeH4qJV9BuyQn/rDHmdDSCFjPMpxvl9bU/t tfNrxGJvBLVAe9UGKCZTEWKfcpgwRp9SIXe6Uw47C3jc+GFuNW9R1LaSSG5aQ0pOwPMrgfY2aM 8uQ=
  • List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On Fri, Jun 19, 2020 at 06:54:26PM +0200, Roger Pau Monné wrote:
> On Fri, Jun 19, 2020 at 06:41:21PM +0200, Martin Lucina wrote:
> > On 2020-06-19 13:21, Roger Pau Monné wrote:
> > > On Fri, Jun 19, 2020 at 12:28:50PM +0200, Martin Lucina wrote:
> > > > On 2020-06-18 13:46, Roger Pau Monné wrote:
> > > > > On Thu, Jun 18, 2020 at 12:13:30PM +0200, Martin Lucina wrote:
> > > > > > At this point I don't really have a clear idea of how to progress,
> > > > > > comparing my implementation side-by-side with the original PV
> > > > > > Mini-OS-based
> > > > > > implementation doesn't show up any differences I can see.
> > > > > >
> > > > > > AFAICT the OCaml code I've also not changed in any material way, and
> > > > > > that
> > > > > > has been running in production on PV for years, so I'd be inclined
> > > > > > to think
> > > > > > the problem is in my reimplementation of the C parts, but where...?
> > > > >
> > > > > A good start would be to print the ISR and IRR lapic registers when
> > > > > blocked, to assert there are no pending vectors there.
> > > > >
> > > > > Can you apply the following patch to your Xen, rebuild and check the
> > > > > output of the 'l' debug key?
> > > > >
> > > > > Also add the output of the 'v' key.
> > > > 
> > > > Had to fight the Xen Debian packages a bit as I wanted to patch the
> > > > exact
> > > > same Xen (there are some failures when building on a system that has
> > > > Xen
> > > > installed due to following symlinks when fixing shebangs).
> > > > 
> > > > Here you go, when stuck during netfront setup, after allocating its
> > > > event
> > > > channel, presumably waiting on Xenstore:
> > > > 
> > > > 'e':
> > > > 
> > > > (XEN) Event channel information for domain 3:
> > > > (XEN) Polling vCPUs: {}
> > > > (XEN)     port [p/m/s]
> > > > (XEN)        1 [1/0/1]: s=3 n=0 x=0 d=0 p=33
> > > > (XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
> > > > (XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0
> > > > (XEN)        4 [0/1/1]: s=2 n=0 x=0 d=0
> > > > 
> > > > 'l':
> > > > 
> > > > (XEN) d3v0 IRR:
> > > > ffff8301732dc200b
> > > > (XEN) d3v0 ISR:
> > > > ffff8301732dc100b
> > > 
> > > Which version of Xen is this? AFAICT it doesn't have the support to
> > > print a bitmap.
> > 
> > That in Debian 10 (stable):
> > 
> > ii  xen-hypervisor-4.11-amd64            4.11.3+24-g14b62ab3e5-1~deb10u1.2
> > amd64        Xen Hypervisor on AMD64
> > 
> > xen_major              : 4
> > xen_minor              : 11
> > xen_extra              : .4-pre
> > xen_version            : 4.11.4-pre
> > 
> > > 
> > > Do you think you could also pick commit
> > > 8cd9500958d818e3deabdd0d4164ea6fe1623d7c [0] and rebuild? (and print
> > > the info again).
> > 
> > Done, here you go:
> > 
> > (XEN) Event channel information for domain 3:
> > (XEN) Polling vCPUs: {}
> > (XEN)     port [p/m/s]
> > (XEN)        1 [1/0/1]: s=3 n=0 x=0 d=0 p=33
> > (XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
> > (XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0
> > (XEN)        4 [0/1/1]: s=3 n=0 x=0 d=0 p=35
> > 
> > 
> > (XEN) d3v0 IRR:
> > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> > (XEN) d3v0 ISR:
> > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> 
> So there's nothing pending on the lapic. Can you assert that you will
> always execute evtchn_demux_pending after you have received an event
> channel interrupt (ie: executed solo5__xen_evtchn_vector_handler)?
> 
> I think this would be simpler if you moved evtchn_demux_pending into
> solo5__xen_evtchn_vector_handler? As there would be less asynchronous
> processing, and thus likely less races?

Having though about this, I think this model of not demuxing in
solo5__xen_evtchn_vector_handler is always racy, as it's not possible
to assert that you would always call evtchn_demux_pending after
solo5__xen_evtchn_vector_handler?

Ie: if you receive an interrupt just before going to sleep (after the
sti and before the hlt) you will execute
solo5__xen_evtchn_vector_handler and EOI the vector, but then
evtchn_demux_pending will never get called, and thus the interrupts
will stay indefinitely pending?

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.