[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event delivery and "domain blocking" on PVHv2



On Monday, 15.06.2020 at 17:58, Andrew Cooper wrote:
> On 15/06/2020 15:25, Martin Lucina wrote:
> > Hi,
> >
> > puzzle time: In my continuing explorations of the PVHv2 ABIs for the
> > new MirageOS Xen stack, I've run into some issues with what looks like
> > missed deliveries of events on event channels.
> >
> > While a simple unikernel that only uses the Xen console and
> > effectively does for (1..5) { printf("foo"); sleep(1); } works fine,
> > once I plug in the existing OCaml Xenstore and Netfront code, the
> > behaviour I see is that the unikernel hangs in random places, blocking
> > as if an event that should have been delivered has been missed.
> 
> You can see what is going on, event channel wise, with the 'e'
> debug-key.  This will highlight cases such as the event channel being
> masked and pending, which is a common guest bug ending up in this state.

Ok, based on your and Roger's suggestions I've made some changes:

1. I've dropped all the legacy PIC initialisation code from the Solo5
parts, written some basic APIC initialisation code and switched to using
HVMOP_set_evtchn_upcall_vector for upcall registration, along with setting
HVM_PARAM_CALLBACK_IRQ to 1 as suggested by Roger and done by Xen when
running as a guest. Commit at [1], nothing controversial there.

2. I've re-worked the "bottom half" of the event channel upcall handler to
mask the event when marking it as pending in the OCaml-side "shadow"
structure, and unmask it immediately before an OCaml-side handler would be
run, i.e. when doing a "test and clear port" operation on the OCaml-side
structure.

However, none of this seems to have fundamentally changed the behaviour.
The unikernel still gets "stuck" at random points during netfront set-up.
Now that I've extended the grant table size, *sometimes* get as far as a
fully functioning netfront and packets will flow, but things will
eventually freeze up (pretty much immediately if I do something like ping
-f).

When the unikernel is in the "frozen" state, the domain is blocked (so we
think we're waiting for events), and the "e" debug key shows the relevant
event channels (Xenstore or netfront) as pending but unmasked:

Example - when hung during netfront bring-up:

(XEN) Event channel information for domain 27:
(XEN) Polling vCPUs: {}
(XEN)     port [p/m/s]
(XEN)        1 [1/0/1]: s=3 n=0 x=0 d=0 p=33
(XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
(XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0

(1 is Xenstore, 2 is console which we don't care about, 3 is VIRQ_TIMER).

When hung after hammering with "ping -f":

(XEN) Event channel information for domain 28:
(XEN) Polling vCPUs: {}
(XEN)     port [p/m/s]
(XEN)        1 [0/0/1]: s=3 n=0 x=0 d=0 p=33
(XEN)        2 [1/1/1]: s=3 n=0 x=0 d=0 p=34
(XEN)        3 [1/0/1]: s=5 n=0 x=0 v=0
(XEN)        4 [1/0/1]: s=3 n=0 x=0 d=0 p=35

(as above, 4 is netfront)

Some more questions about the "bottom half" of the event channel upcall,
called periodically by OCaml outside of interrupt context:

static bool evtchn_demux_pending(void)
{
    struct shared_info *s = SHARED_INFO();
    struct vcpu_info *vi = VCPU0_INFO();
    bool some_pending = false;

    vi->evtchn_upcall_pending = 0;
>>>> Based on Roger's suggestion, this is now only done here and no longer
>>>> in the "top half" in interrupt context, which now only ACKs the vector
>>>> on the APIC.

    /*
     * Demux events received from Xen.
     *
     * pending_l1 is the "outer" per-VCPU selector (evtchn_pending_sel).
     * pending_l2 is the "inner" system-wide word (evtchn_pending[l1i]).
     */
    xen_ulong_t pending_l1, pending_l2;
    atomic_sync_xchg(&vi->evtchn_pending_sel, 0, &pending_l1);
    while (pending_l1 != 0) {
        xen_ulong_t l1i = ffs(pending_l1);
        pending_l1 &= ~(1UL << l1i);

        /*
         * Masking pending_l2 with ~evtchn_mask[l1i] is necessary to get the
         * *current* masked events; otherwise races like the following
         * can occur:
         *
         *     1. X is generated, upcall scheduled by Xen.
         *     2. X is masked.
         *     3. Upcall is delivered.
         *     4. X fires despite now being masked.
         */
        while ((pending_l2 =
                    (s->evtchn_pending[l1i] & ~s->evtchn_mask[l1i])) != 0) {
            xen_ulong_t l2i = ffs(pending_l2);
            pending_l2 &= ~(1UL << l2i);
>>>> Do I need the above? It doesn't make a difference and seems redundant
>>>> since pending_l2 is always reassigned in the loop, but Mini-OS and
>>>> others are doing it...?

            evtchn_port_t port = (l1i * (sizeof(xen_ulong_t) * 8)) + l2i;
            /*
             * Mark as pending on the OCaml side and mask the event until
             * just before OCaml gets around to handling it. Also, clear
             * the pending bit on the Xen side.
             */
            evtchn_callback_ml[port] = 1;
            atomic_sync_bts(l2i, &s->evtchn_mask[l1i]);
>>>> Mask added at Roger's suggestion, not in the original Mini-OS PV-based
>>>> implementation.
            atomic_sync_btc(l2i, &s->evtchn_pending[l1i]);
            some_pending = true;
        }
    }
    return some_pending;
}

The OCaml code essentially calls the above periodically, and, if
it returns true, then calls into the following "test and clear" operation
for all ports:

static inline bool evtchn_test_and_clear(evtchn_port_t port)
{
    assert(port < EVTCHN_2L_NR_CHANNELS);
    if (evtchn_callback_ml[port] > 0) {
        evtchn_callback_ml[port] = 0;
        evtchn_unmask(port);
>>>> Unmask added at Roger's suggestion, not in the original Mini-OS
>>>> PV-based implementation. I'm not entirely happy about this, since
>>>> it'll probably result in a lot more hypercalls when the event channel
>>>> is busy?
        return true;
    }
    else {
        return false;
    }
}

If this returns true, then the event handler will get run immediately after
returning from here, and before any further call to evtchn_demux_pending().

At this point I don't really have a clear idea of how to progress,
comparing my implementation side-by-side with the original PV Mini-OS-based
implementation doesn't show up any differences I can see.

AFAICT the OCaml code I've also not changed in any material way, and that
has been running in production on PV for years, so I'd be inclined to think
the problem is in my reimplementation of the C parts, but where...?

Martin

[1] 
https://github.com/mato/solo5/commit/42afe3a2177b9caf63d0df435391ae03a6684ac8



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.