[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] Parallelizing writing to network devices
On 18 December 2014 at 15:17, Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx> wrote: > > On 18/12/14 13:19, Thomas Leonard wrote: >> >> On 17 December 2014 at 18:05, Masoud Koleini >> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote: >>> >>> Thanks Thomas for the great tracing tool! >>> >>> The following is a very simple unikernel with two interfaces, which >>> redirects frames captured on the first interface to the second one: >>> >>> https://github.com/koleini/parallelisation >>> >>> The problem is that in a high packet rate (more than 80'000 pps), switch >>> stops receiving. The goal is to spot the problem and enhance the >>> throughput >>> of Mirage netif. >>> >>> Test environment consists of another vm running a traffic generator and >>> sending frames of a specific pattern (UDP frames of size 100 bytes) over >>> the >>> bridge that connects to the first interface of the unikernel. Unikernel >>> forwards frames by collecting a number of frames from input queue and >>> running the same number of threads that write them to the output >>> interface. >>> >>> Two trace files are uploaded to the repo. The first file is the output of >>> this configuration. This trace shows that each netif write locks until >>> the >>> thread that writes on the front-end connection to the ring is returned >>> (function write_already_locked.) >> >> Do these traces show it after it stopped? The second has a long sleep, >> while the first looks like it was in the middle of a run. >> >> If it had stopped in both cases, it suggests that the whole unikernel >> stopped (not just the listen thread), because there are no more timer >> interrupts and no sleep region. >> >> Does "xl top" show the unikernel still using the CPU? Or it is >> waiting, or crashed? >> >> If you have a thread writing a string to the console once per second, >> does it continue after unikernel stops accepting frames? > > Yes, both are. It looks that I have more info on the traces with updated > Mirage libraries. So, I updated the traces in the repo. > > The unikernel is still working, as traces that periodically write info on > the console are still working too. I'm not sure, but it might be worth applying this fix and testing again: https://github.com/mirage/mirage-net-xen/pull/16 (when Netif stopped to wait for space in the transmit ring, it would sometimes fail to notice when space became available) > With original configuration (netif unchanged), it looks that the reason is > unikernel gets out of memory after some time, while error message is shown > only in a few experiments. This is the main bottleneck for Mirage > applications, which is waiting for a packet write to terminate is time > consuming and doesn't allow high rate packet switching for network > applications. > > Modifying netif by ignoring the thread that is waiting for the result of > writing to the ring is also problematic. So, any idea how to do bulk packet > write on a network interface? > > >> >>> For the second trace, the return of the thread is ignored (commenting out >>> "lwt () = th in" in write_already_locked). This considerably increases >>> switching speed, but after some running time, it looks that after garbage >>> collection, similar problem happens. >>> >>> Thomas and Anil, any idea from given traces, and how it is possible to >>> make >>> the traces more informative? >>> >>> Thanks. >>> >>> >>> On 28/11/14 16:55, Thomas Leonard wrote: >>>> >>>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote: >>>>>> >>>>>> On 28 Nov 2014, at 16:03, Masoud Koleini >>>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> Thanks Anil. >>>>>> >>>>>>> - graph the ring utilisation to see if it's always full (Thomas >>>>>>> Leonard's profiling patches should help here) >>>>>> >>>>>> Would you please point me out to the profiling patches? >>>>> >>>>> See: >>>>> >>>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/ >>>> >>>> The installation instructions here are for the previous version >>>> (though they should still work). If you want to try the latest >>>> version, the current Git mirage allows you to pass a ~tracing argument >>>> to "register" in your config.ml, e.g. >>>> >>>> let tracing = mprof_trace ~size:1000000 () in >>>> register "myunikernel" ~tracing [ >>>> main $ ... >>>> ] >>>> >>>> This uses a newer version of the profiling API. You should generally >>>> "opam pin" the #tracing2 branches rather than #tracing to use it. >>>> >>>> Note also that it doesn't currently record ring utilisation, so you'll >>>> still need to do some work to get that. You could use the >>>> MProf.Counter interface, in which case the GUI will display it as a >>>> graph over the trace. >>>> >>>>>>> - try to reduce the parallelisation to see if some condition there >>>>>>> alleviates the issue to track it down. >>>>>> >>>>>> Reducing the maximum number of threads running in parallel reduced CPU >>>>>> utilization, and vm was functioning for a much longer time, but the >>>>>> same >>>>>> problem occurred at the end. >>>>>> >>>>>> It might be more useful looking at the code. Please have a look at the >>>>>> function "f_thread" in the file uploaded on the following repo: >>>>>> >>>>>> https://github.com/koleini/parallelisation >>>>> >>>>> That's a lot of code to try and distill down a test case. Try to cut >>>>> it >>>>> down significantly by building a minimal Ethernet traffic generator >>>>> that >>>>> outputs frames with a predictable pattern in the frame, and a receiver >>>>> that >>>>> will check that the pattern is received as expected. >>>>> >>>>> Then you can try out your parallel algorithm variations on the simple >>>>> Ethernet sender/receiver and narrow down the problem without all the >>>>> other >>>>> concerns. >>>>> >>>>> Once the bug is tracked down, we can add the sender/receiver into >>>>> mirage-skeleton and use it as a test case to ensure that this >>>>> functional >>>>> never regresses in the future. Line rate Ethernet transmission has >>>>> worked >>>>> in the past, but we never added a test case to ensure it stays working. >>>>> >>>>> Anil >>>>> _______________________________________________ >>>>> MirageOS-devel mailing list >>>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx >>>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel >>>> >>>> >>>> >>> >>> >>> >>> >>> This message and any attachment are intended solely for the addressee and >>> may contain confidential information. If you have received this message >>> in >>> error, please send it back to me, and immediately delete it. Please do >>> not >>> use, copy or disclose the information contained in this message or in any >>> attachment. Any views or opinions expressed by the author of this email >>> do >>> not necessarily reflect the views of the University of Nottingham. >>> >>> This message has been checked for viruses but the contents of an >>> attachment >>> may still contain software viruses which could damage your computer >>> system, >>> you are advised to perform your own checks. Email communications with the >>> University of Nottingham may be monitored as permitted by UK legislation. >>> >> >> > > > > > > This message and any attachment are intended solely for the addressee and > may contain confidential information. If you have received this message in > error, please send it back to me, and immediately delete it. Please do not > use, copy or disclose the information contained in this message or in any > attachment. Any views or opinions expressed by the author of this email do > not necessarily reflect the views of the University of Nottingham. > > This message has been checked for viruses but the contents of an attachment > may still contain software viruses which could damage your computer system, > you are advised to perform your own checks. Email communications with the > University of Nottingham may be monitored as permitted by UK legislation. > -- Dr Thomas Leonard http://0install.net/ GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |