[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [MirageOS-devel] Problem with Lwt and TCP/IP
Dear all I am working on a virtual network interface/software bridge implementation for Mirage [1]. The idea is to provide a Vnetif module that can be used as a replacement for Netif. Network stacks using Vnetif can then be connected to the same software backend and send frames to each other. An example unikernel with two threads connecting over TCP is available here [2]. Initially I wrote the backend [3] so that the write function would block until all calls to the listeners had returned. This does unfortunately not work with the Mirage TCP/IP stack as some of the calls to listen will block while waiting for more data. To fix this I now run the listener functions optionally in Lwt.async (as specified with the use_async_readers-parameter to Basic_backend.create). This could starve the listener threads as write never blocks, but TCP/IP should adjust its rate automatically. The workaround seems to run fine under Xen and the iperf-like throughput between two threads with [4] is about 400 000 KBit/s in Xen/Virtualbox. In Unix mode the throughput is about 1 200 000 KBit/s. Unfortunately, the test also sometimes deadlocks in Unix mode. I thought the deadlock could be caused by the faster write threads starving the listeners, so I added an Lwt.pause in iperf_self.ml after each call to write [5]. The test then runs fine under Unix, with a reduced throughput of about 500 000 KBit/s. Unfortunately, the call to Lwt.pause is much slower under Xen and the throughput is around 46 Kbit/s (!). I am not sure what would be the best way to fix this. If there is a race, I guess it is likely that it is in either mirage-tcpip or in my code. I have not been able to find one in the vnetif code, but maybe there is something I have overlooked that is triggered under high load. The code works fine when I run iperf without TCP/IP and with async disabled [6] (tested with throughput around 17 000 000 KBit/s). A solution could also be to try to speed up Lwt.pause in Xen, so that we at least could run the same iperf test on both platforms. In the run loop in [7] the domain is blocked when there are no new events on the event channel, even when there are paused threads. By adding a check for Lwt.paused_count () = 0 before blocking the domain (code here [8]), the Xen w/Lwt.pause iperf throughput is 60 000 KBit/s (still slow, but useable). Any thoughts or ideas on how to proceed appreciated :-) 1. https://github.com/MagnusS/mirage-vnetif 2. https://github.com/MagnusS/mirage-vnetif/blob/master/examples/connect/unikernel.ml 3. https://github.com/MagnusS/mirage-vnetif/blob/master/lib/basic_backend.ml 4. https://github.com/MagnusS/mirage-vnetif/tree/master/examples/iperf_self 5. https://github.com/MagnusS/mirage-vnetif/blob/lwt-pause/examples/iperf_self/iperf_self.ml#L94 6. https://github.com/MagnusS/mirage-vnetif/blob/master/examples/iperf_vnetif/iperf_vnetif.ml 7. https://github.com/mirage/mirage-platform/blob/master/xen/lib/main.ml 8. https://github.com/MagnusS/mirage-platform/blob/xen-fast-pause/xen/lib/main.ml#L77 Magnus _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |