[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Problem with Lwt and TCP/IP



> On 23 Mar 2015, at 15:55, Magnus Skjegstad <magnus@xxxxx> wrote:
> 
> Dear all
> 
> I am working on a virtual network interface/software bridge
> implementation for Mirage [1]. The idea is to provide a Vnetif module
> that can be used as a replacement for Netif. Network stacks using Vnetif
> can then be connected to the same software backend and send frames to
> each other. An example unikernel with two threads connecting over TCP is
> available here [2]. 
> 
> Initially I wrote the backend [3] so that the write function would block
> until all calls to the listeners had returned. This does unfortunately
> not work with the Mirage TCP/IP stack as some of the calls to listen
> will block while waiting for more data. To fix this I now run the
> listener functions optionally in Lwt.async (as specified with the
> use_async_readers-parameter to Basic_backend.create). This could starve
> the listener threads as write never blocks, but TCP/IP should adjust its
> rate automatically.
> 
> The workaround seems to run fine under Xen and the iperf-like throughput
> between two threads with [4] is about 400 000 KBit/s in Xen/Virtualbox.
> In Unix mode the throughput is about 1 200 000 KBit/s. Unfortunately,
> the test also sometimes deadlocks in Unix mode.

Iâd love to know where itâs deadlocking. Perhaps try reproducing it with 
tracing enabled?

http://openmirage.org/wiki/profiling

[ I hope it reproduces at a lower throughput! ]

> 
> I thought the deadlock could be caused by the faster write threads
> starving the listeners, so I added an Lwt.pause in iperf_self.ml after
> each call to write [5]. The test then runs fine under Unix, with a
> reduced throughput of about 500 000 KBit/s. Unfortunately, the call to
> Lwt.pause is much slower under Xen and the throughput is around 46
> Kbit/s (!). 
> 
> I am not sure what would be the best way to fix this. If there is a
> race, I guess it is likely that it is in either mirage-tcpip or in my
> code. I have not been able to find one in the vnetif code, but maybe
> there is something I have overlooked that is triggered under high load.
> The code works fine when I run iperf without TCP/IP and with async
> disabled [6] (tested with throughput around 17 000 000 KBit/s).
> 
> A solution could also be to try to speed up Lwt.pause in Xen, so that we
> at least could run the same iperf test on both platforms. In the run
> loop in [7] the domain is blocked when there are no new events on the
> event channel, even when there are paused threads. By adding a check for
> Lwt.paused_count () = 0 before blocking the domain (code here [8]), the
> Xen w/Lwt.pause iperf throughput is 60 000 KBit/s (still slow, but
> useable). 

The Lwt.pause thing sounds like itâs probably masking a bug elsewhere that we 
ought to track down.

I donât really understand how pause is supposed to be used. In fact Iâd never 
noticed it existed until you pointed it out! The docs just say

- pause () is a sleeping thread which is wake up on the next call to 
Lwt.âwakeup_paused.
- wakeup_paused () wakes up all threads which suspended themselves with 
Lwt.âpause. This function is called by the scheduler, before entering the main 
loop

It sounds like thereâs no guarantee that the scheduler is going to be run any 
time soon. With your proposed modification, how similar would a paused thread 
be to a thread sleeping for 0 seconds?

> Any thoughts or ideas on how to proceed appreciated :-)

Sorry I couldnât be more helpful!

Dave

> 
> 1. https://github.com/MagnusS/mirage-vnetif
> 2.
> https://github.com/MagnusS/mirage-vnetif/blob/master/examples/connect/unikernel.ml
> 3.
> https://github.com/MagnusS/mirage-vnetif/blob/master/lib/basic_backend.ml
> 4.
> https://github.com/MagnusS/mirage-vnetif/tree/master/examples/iperf_self
> 5.
> https://github.com/MagnusS/mirage-vnetif/blob/lwt-pause/examples/iperf_self/iperf_self.ml#L94
> 6.
> https://github.com/MagnusS/mirage-vnetif/blob/master/examples/iperf_vnetif/iperf_vnetif.ml
> 7. https://github.com/mirage/mirage-platform/blob/master/xen/lib/main.ml
> 8.
> https://github.com/MagnusS/mirage-platform/blob/xen-fast-pause/xen/lib/main.ml#L77
> 
> Magnus
> 
> _______________________________________________
> MirageOS-devel mailing list
> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.