Xen project Mailing List

Re: [MirageOS-devel] Parallelizing writing to network devices

To: Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx>, David Scott <Dave.Scott@xxxxxxxxxxxxx>

From: Thomas Leonard <talex5@xxxxxxxxx>

Date: Thu, 18 Dec 2014 11:13:30 +0000

Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>

Delivery-date: Thu, 18 Dec 2014 11:13:36 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 18 December 2014 at 11:01, Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx> wrote: > >> On 17 December 2014 at 18:05, Masoud >> Koleini<masoud.koleini@xxxxxxxxxxxxxxxx> wrote: >>> >>> Thanks Thomas for the great tracing tool! >>> >>> The following is a very simple unikernel with two interfaces, which >>> redirects frames captured on the first interface to the second one: >>> >>> https://github.com/koleini/parallelisation >>> >>> The problem is that in a high packet rate (more than 80'000 pps), switch >>> stops receiving. The goal is to spot the problem and enhance the >>> throughput >>> of Mirage netif. >> >> I don't know if this is the problem, but in the code, I see you do: >> >> listen if1 if2 >> >> (forward_thread if2) >> >> This ignores the result from listen, so if the listen thread later >> fails then the error will be discarded. I'd try something like this: >> >> Lwt.choose [ >> listen if1 if2; >> forward_thread if2 >> ] >> >> (and lose the "return" at the end of listen) > > > listen calls netif listen function, which only sets receive_callback on the > interface (used by poll_thread). It looks that failing poll_thread is not > monitored in netif code. Hmm. You're right. Does it show anything if you catch errors there? Looks like it was changed here: https://github.com/mirage/mirage-net-xen/commit/5d9df74fb0d8a3eecda8beac9e694f939392273c Perhaps listen could just return the poll thread, rather than making a fake task. Dave? >> I really think the >> operator should be banned... >> >>> Test environment consists of another vm running a traffic generator and >>> sending frames of a specific pattern (UDP frames of size 100 bytes) over >>> the >>> bridge that connects to the first interface of the unikernel. Unikernel >>> forwards frames by collecting a number of frames from input queue and >>> running the same number of threads that write them to the output >>> interface. >>> >>> Two trace files are uploaded to the repo. The first file is the output of >>> this configuration. This trace shows that each netif write locks until >>> the >>> thread that writes on the front-end connection to the ring is returned >>> (function write_already_locked.) >>> >>> For the second trace, the return of the thread is ignored (commenting out >>> "lwt () = th in" in write_already_locked). This considerably increases >>> switching speed, but after some running time, it looks that after garbage >>> collection, similar problem happens. >>> >>> Thomas and Anil, any idea from given traces, and how it is possible to >>> make >>> the traces more informative? >>> >>> Thanks. >>> >>> >>> On 28/11/14 16:55, Thomas Leonard wrote: >>>> >>>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote: >>>>>> >>>>>> On 28 Nov 2014, at 16:03, Masoud Koleini >>>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> Thanks Anil. >>>>>> >>>>>>> - graph the ring utilisation to see if it's always full (Thomas >>>>>>> Leonard's profiling patches should help here) >>>>>> >>>>>> Would you please point me out to the profiling patches? >>>>> >>>>> See: >>>>> >>>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/ >>>> >>>> The installation instructions here are for the previous version >>>> (though they should still work). If you want to try the latest >>>> version, the current Git mirage allows you to pass a ~tracing argument >>>> to "register" in your config.ml, e.g. >>>> >>>> let tracing = mprof_trace ~size:1000000 () in >>>> register "myunikernel" ~tracing [ >>>> main $ ... >>>> ] >>>> >>>> This uses a newer version of the profiling API. You should generally >>>> "opam pin" the #tracing2 branches rather than #tracing to use it. >>>> >>>> Note also that it doesn't currently record ring utilisation, so you'll >>>> still need to do some work to get that. You could use the >>>> MProf.Counter interface, in which case the GUI will display it as a >>>> graph over the trace. >>>> >>>>>>> - try to reduce the parallelisation to see if some condition there >>>>>>> alleviates the issue to track it down. >>>>>> >>>>>> Reducing the maximum number of threads running in parallel reduced CPU >>>>>> utilization, and vm was functioning for a much longer time, but the >>>>>> same >>>>>> problem occurred at the end. >>>>>> >>>>>> It might be more useful looking at the code. Please have a look at the >>>>>> function "f_thread" in the file uploaded on the following repo: >>>>>> >>>>>> https://github.com/koleini/parallelisation >>>>> >>>>> That's a lot of code to try and distill down a test case. Try to cut >>>>> it >>>>> down significantly by building a minimal Ethernet traffic generator >>>>> that >>>>> outputs frames with a predictable pattern in the frame, and a receiver >>>>> that >>>>> will check that the pattern is received as expected. >>>>> >>>>> Then you can try out your parallel algorithm variations on the simple >>>>> Ethernet sender/receiver and narrow down the problem without all the >>>>> other >>>>> concerns. >>>>> >>>>> Once the bug is tracked down, we can add the sender/receiver into >>>>> mirage-skeleton and use it as a test case to ensure that this >>>>> functional >>>>> never regresses in the future. Line rate Ethernet transmission has >>>>> worked >>>>> in the past, but we never added a test case to ensure it stays working. >>>>> >>>>> Anil >>>>> _______________________________________________ >>>>> MirageOS-devel mailing list >>>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx >>>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel >>>> >>>> >>>> >>> >>> >>> >>> >>> This message and any attachment are intended solely for the addressee and >>> may contain confidential information. If you have received this message >>> in >>> error, please send it back to me, and immediately delete it. Please do >>> not >>> use, copy or disclose the information contained in this message or in any >>> attachment. Any views or opinions expressed by the author of this email >>> do >>> not necessarily reflect the views of the University of Nottingham. >>> >>> This message has been checked for viruses but the contents of an >>> attachment >>> may still contain software viruses which could damage your computer >>> system, >>> you are advised to perform your own checks. Email communications with the >>> University of Nottingham may be monitored as permitted by UK legislation. >>> >> >> > > > > > > This message and any attachment are intended solely for the addressee and > may contain confidential information. If you have received this message in > error, please send it back to me, and immediately delete it. Please do not > use, copy or disclose the information contained in this message or in any > attachment. Any views or opinions expressed by the author of this email do > not necessarily reflect the views of the University of Nottingham. > > This message has been checked for viruses but the contents of an attachment > may still contain software viruses which could damage your computer system, > you are advised to perform your own checks. Email communications with the > University of Nottingham may be monitored as permitted by UK legislation. > -- Dr Thomas Leonard http://0install.net/ GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.