I am following up to share
some experience from when I was experimenting with
the original mirage switch. I also have a few design
suggestions, but maybe they are invalid.
I think the problem mentioned in the
original mail stems from the way packets are
handled by a xen unikernel. if you check
https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml
on line 300, for each new packet arriving on a
VIF, the code will start a thread and ignore the
result. As a result, if you start sending lots of
packets to a unikernel and the main processing
pipeline cannot keep up with the rate, then new
packets are allocated and delegated to an handling
thread, but these threads will never return and
they will create a huge backlog, which at some
point will exhaust memory.
From a design point of view the best
approach to solve this problem, I think, is to
create a pool of threads with a fixed size. If
the pool doesn't have a free thread, then the
packet is dropped. Alternatively, the driver can
be redesigned to create backpressure to the
netback and force packets to drop in the Dom0,
instead of the unikernel, and thus reduce a bit
the cpu load. In the past I tried to rewrite the
rx_poll method, But the problem is that the
function that is passed as a handler to ack
new pages from the netback is not lwt aware,
thus you cannot easily integrate some of the lwt
asyncronicity in the processing pipeline.