[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event delivery and "domain blocking" on PVHv2


  • To: Martin Lucina <martin@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 19 Jun 2020 18:54:26 +0200
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, mirageos-devel@xxxxxxxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 19 Jun 2020 16:54:41 +0000
  • Ironport-sdr: J4TTlQT1ID5jKu5v1esKXrwor2/zAp+QL4MzFP8/Stzr6ep8JzIq6TBWC+2B+1wu//P/e5inS4 Mn+DHxXvKIzOl357XnfHqoDs0xpYy0xl5mx3Qzfp4Y2nJCSwldpSHki+p2eq7mkHJ1b861/8/I GyMOC9YGppDuxYiIY4JPgX7QbJeksDSYAaHK046KnsDVsZCVQVBLjNXD9JApWmG51ra658sgN6 Zurgw2NkuBv/NTad0SpeRA77u6mgdvgfkLmAzs9QpfMXKvk2Mt7j2xXW0PNB1w0LeFWck5Li3s tyI=
  • List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On Fri, Jun 19, 2020 at 06:41:21PM +0200, Martin Lucina wrote:
> On 2020-06-19 13:21, Roger Pau Monné wrote:
> > On Fri, Jun 19, 2020 at 12:28:50PM +0200, Martin Lucina wrote:
> > > On 2020-06-18 13:46, Roger Pau Monné wrote:
> > > > On Thu, Jun 18, 2020 at 12:13:30PM +0200, Martin Lucina wrote:
> > > > > At this point I don't really have a clear idea of how to progress,
> > > > > comparing my implementation side-by-side with the original PV
> > > > > Mini-OS-based
> > > > > implementation doesn't show up any differences I can see.
> > > > >
> > > > > AFAICT the OCaml code I've also not changed in any material way, and
> > > > > that
> > > > > has been running in production on PV for years, so I'd be inclined
> > > > > to think
> > > > > the problem is in my reimplementation of the C parts, but where...?
> > > >
> > > > A good start would be to print the ISR and IRR lapic registers when
> > > > blocked, to assert there are no pending vectors there.
> > > >
> > > > Can you apply the following patch to your Xen, rebuild and check the
> > > > output of the 'l' debug key?
> > > >
> > > > Also add the output of the 'v' key.
> > > 
> > > Had to fight the Xen Debian packages a bit as I wanted to patch the
> > > exact
> > > same Xen (there are some failures when building on a system that has
> > > Xen
> > > installed due to following symlinks when fixing shebangs).
> > > 
> > > Here you go, when stuck during netfront setup, after allocating its
> > > event
> > > channel, presumably waiting on Xenstore:
> > > 
> > > 'e':
> > > 
> > > (XEN) Event channel information for domain 3:
> > > (XEN) Polling vCPUs: {}
> > > (XEN)     port [p/m/s]
> > > (XEN)        1 [1/0/1]: s=3 n=0 x=0 d=0 p=33
> > > (XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
> > > (XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0
> > > (XEN)        4 [0/1/1]: s=2 n=0 x=0 d=0
> > > 
> > > 'l':
> > > 
> > > (XEN) d3v0 IRR:
> > > ffff8301732dc200b
> > > (XEN) d3v0 ISR:
> > > ffff8301732dc100b
> > 
> > Which version of Xen is this? AFAICT it doesn't have the support to
> > print a bitmap.
> 
> That in Debian 10 (stable):
> 
> ii  xen-hypervisor-4.11-amd64            4.11.3+24-g14b62ab3e5-1~deb10u1.2
> amd64        Xen Hypervisor on AMD64
> 
> xen_major              : 4
> xen_minor              : 11
> xen_extra              : .4-pre
> xen_version            : 4.11.4-pre
> 
> > 
> > Do you think you could also pick commit
> > 8cd9500958d818e3deabdd0d4164ea6fe1623d7c [0] and rebuild? (and print
> > the info again).
> 
> Done, here you go:
> 
> (XEN) Event channel information for domain 3:
> (XEN) Polling vCPUs: {}
> (XEN)     port [p/m/s]
> (XEN)        1 [1/0/1]: s=3 n=0 x=0 d=0 p=33
> (XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
> (XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0
> (XEN)        4 [0/1/1]: s=3 n=0 x=0 d=0 p=35
> 
> 
> (XEN) d3v0 IRR:
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
> (XEN) d3v0 ISR:
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000

So there's nothing pending on the lapic. Can you assert that you will
always execute evtchn_demux_pending after you have received an event
channel interrupt (ie: executed solo5__xen_evtchn_vector_handler)?

I think this would be simpler if you moved evtchn_demux_pending into
solo5__xen_evtchn_vector_handler? As there would be less asynchronous
processing, and thus likely less races?

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.