[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Parallelizing writing to network devices

On 17 December 2014 at 18:05, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
> Thanks Thomas for the great tracing tool!
> The following is a very simple unikernel with two interfaces, which
> redirects frames captured on the first interface to the second one:
> https://github.com/koleini/parallelisation
> The problem is that in a high packet rate (more than 80'000 pps), switch
> stops receiving. The goal is to spot the problem and enhance the throughput
> of Mirage netif.

I don't know if this is the problem, but in the code, I see you do:

  listen if1 if2
  >> (forward_thread if2)

This ignores the result from listen, so if the listen thread later
fails then the error will be discarded. I'd try something like this:

  Lwt.choose [
    listen if1 if2;
    forward_thread if2

(and lose the "return" at the end of listen)

I really think the >> operator should be banned...

> Test environment consists of another vm running a traffic generator and
> sending frames of a specific pattern (UDP frames of size 100 bytes) over the
> bridge that connects to the first interface of the unikernel. Unikernel
> forwards frames by collecting a number of frames from input queue and
> running the same number of threads that write them to the output interface.
> Two trace files are uploaded to the repo. The first file is the output of
> this configuration. This trace shows that each netif write locks until the
> thread that writes on the front-end connection to the ring is returned
> (function write_already_locked.)
> For the second trace, the return of the thread is ignored (commenting out
> "lwt () = th in" in write_already_locked). This considerably increases
> switching speed, but after some running time, it looks that after garbage
> collection, similar problem happens.
> Thomas and Anil, any idea from given traces, and how it is possible to make
> the traces more informative?
> Thanks.
> On 28/11/14 16:55, Thomas Leonard wrote:
>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>>>> On 28 Nov 2014, at 16:03, Masoud Koleini
>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>> Thanks Anil.
>>>>> - graph the ring utilisation to see if it's always full (Thomas
>>>>> Leonard's profiling patches should help here)
>>>> Would you please point me out to the profiling patches?
>>> See:
>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>> The installation instructions here are for the previous version
>> (though they should still work). If you want to try the latest
>> version, the current Git mirage allows you to pass a ~tracing argument
>> to "register" in your config.ml, e.g.
>> let tracing = mprof_trace ~size:1000000 () in
>> register "myunikernel" ~tracing [
>>    main $ ...
>> ]
>> This uses a newer version of the profiling API. You should generally
>> "opam pin" the #tracing2 branches rather than #tracing to use it.
>> Note also that it doesn't currently record ring utilisation, so you'll
>> still need to do some work to get that. You could use the
>> MProf.Counter interface, in which case the GUI will display it as a
>> graph over the trace.
>>>>> - try to reduce the parallelisation to see if some condition there
>>>>> alleviates the issue to track it down.
>>>> Reducing the maximum number of threads running in parallel reduced CPU
>>>> utilization, and vm was functioning for a much longer time, but the same
>>>> problem occurred at the end.
>>>> It might be more useful looking at the code. Please have a look at the
>>>> function "f_thread" in the file uploaded on the following repo:
>>>> https://github.com/koleini/parallelisation
>>> That's a lot of code to try and distill down a test case.  Try to cut it
>>> down significantly by building a minimal Ethernet traffic generator that
>>> outputs frames with a predictable pattern in the frame, and a receiver that
>>> will check that the pattern is received as expected.
>>> Then you can try out your parallel algorithm variations on the simple
>>> Ethernet sender/receiver and narrow down the problem without all the other
>>> concerns.
>>> Once the bug is tracked down, we can add the sender/receiver into
>>> mirage-skeleton and use it as a test case to ensure that this functional
>>> never regresses in the future.  Line rate Ethernet transmission has worked
>>> in the past, but we never added a test case to ensure it stays working.
>>> Anil
>>> _______________________________________________
>>> MirageOS-devel mailing list
>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it.   Please do not
> use, copy or disclose the information contained in this message or in any
> attachment.  Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.

Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

MirageOS-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.