[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event delivery and "domain blocking" on PVHv2

  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>, <martin@xxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 15 Jun 2020 17:58:36 +0100
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Delivery-date: Mon, 15 Jun 2020 16:58:50 +0000
  • Ironport-sdr: EXtZe8DLeKVW4NdhUXHJmGzS59nbanjNfuh7yl5Z6MYZ/Kz9FQVS9HiRIleskPNxIm72kqgted AOGB8s90YLICbqr6FjalVEdRHyP4F79zci3LGleFoo9/t2Lo7GEq6+DUYDLS5pyyGl3KAXJpAK uthQbNTH8WWi950siBHikKGEDc4Zoaq7mP3/iaobTRbg+UD9ootzT0VIvv52jWVx97Zn+heRu5 iLZ9UkXvVuxmwXpEdzTlR9bW+gDx2sAAlUxKh3Qc3DHDLyn0YJhOC05c+LXpLpUyQuSb55w/TF pjg=
  • List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 15/06/2020 15:25, Martin Lucina wrote:
> Hi,
> puzzle time: In my continuing explorations of the PVHv2 ABIs for the
> new MirageOS Xen stack, I've run into some issues with what looks like
> missed deliveries of events on event channels.
> While a simple unikernel that only uses the Xen console and
> effectively does for (1..5) { printf("foo"); sleep(1); } works fine,
> once I plug in the existing OCaml Xenstore and Netfront code, the
> behaviour I see is that the unikernel hangs in random places, blocking
> as if an event that should have been delivered has been missed.

You can see what is going on, event channel wise, with the 'e'
debug-key.  This will highlight cases such as the event channel being
masked and pending, which is a common guest bug ending up in this state.

> <snip>
> Given that I've essentially re-written the low-level event channel C
> code, I'd like to verify that the mechanisms I'm using for event
> delivery are indeed the right thing to do on PVHv2.
> For event delivery, I'm registering the upcall with Xen as follows:
>     uint64_t val = 32ULL;
>     val |= (uint64_t)HVM_PARAM_CALLBACK_TYPE_VECTOR << 56;
>     int rc = hypercall_hvm_set_param(HVM_PARAM_CALLBACK_IRQ, val);
>     assert(rc == 0);
> i.e. upcalls are to be delivered via IDT vector.

Don't use HVM_PARAM_CALLBACK_TYPE_VECTOR.  It is conceptually broken, as
it bypasses all queueing and IRR logic in the LAPIC.

At some point, I'm going to have to figure out a compatible way to deal
with all the guests still using this mechanism, because it is
incompatible with using hardware accelerated APIC support in
IvyBridge/Zen+ hardware.

Use HVMOP_set_evtchn_upcall_vector instead, which does the same thing,
but actually behaves like a real vector as far as the rest of the LAPIC
is concerned.

> Questions:
> 1. Being based on the Solo5 virtio code, the low-level setup code is
> doing the "usual" i8259 PIC setup, to remap the PIC IRQs to vectors 32
> and above. Should I be doing this initialisation for Xen PVH at all?
> I'm not interested in using the PIC for anything, and all interrupts
> will be delivered via Xen event channels.

PVH guests don't get a PIC by default.  Xen will just be swallowing all
your setup and doing nothing with it.

"plain" PVH guests also don't get an IO-APIC by default.  Unless you're
wanting to get PVH dom0 support working, (or PCI Passthrough in the
future), don't worry about the IO-APIC either.

> 2. Related to the above, the IRQ handler code is ACKing the interrupt
> after the handler runs. Should I be doing that? Does ACKing "IRQ" 0 on
> the PIC have any interactions with Xen's view of event
> channels/pending upcalls?

There's no PIC to begin with, but even then, talking to the PIC/IO-APIC
would only be correct for type INTX/GSI.

TYPE_VECTOR shouldn't have an ack at the LAPIC (it is this properly
which makes it incompatible with hardware acceleration), while
HVMOP_set_evtchn_upcall_vector should be acked at the LAPIC.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.