[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Parallelizing writing to network devices



On 18 December 2014 at 11:01, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>
>> On 17 December 2014 at 18:05, Masoud
>> Koleini<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> Thanks Thomas for the great tracing tool!
>>>
>>> The following is a very simple unikernel with two interfaces, which
>>> redirects frames captured on the first interface to the second one:
>>>
>>> https://github.com/koleini/parallelisation
>>>
>>> The problem is that in a high packet rate (more than 80'000 pps), switch
>>> stops receiving. The goal is to spot the problem and enhance the
>>> throughput
>>> of Mirage netif.
>>
>> I don't know if this is the problem, but in the code, I see you do:
>>
>>    listen if1 if2
>>    >> (forward_thread if2)
>>
>> This ignores the result from listen, so if the listen thread later
>> fails then the error will be discarded. I'd try something like this:
>>
>>    Lwt.choose [
>>      listen if1 if2;
>>      forward_thread if2
>>    ]
>>
>> (and lose the "return" at the end of listen)
>
>
> listen calls netif listen function, which only sets receive_callback on the
> interface (used by poll_thread). It looks that failing poll_thread is not
> monitored in netif code.

Hmm. You're right. Does it show anything if you catch errors there?

Looks like it was changed here:

https://github.com/mirage/mirage-net-xen/commit/5d9df74fb0d8a3eecda8beac9e694f939392273c

Perhaps listen could just return the poll thread, rather than making a
fake task. Dave?

>> I really think the >> operator should be banned...
>>
>>> Test environment consists of another vm running a traffic generator and
>>> sending frames of a specific pattern (UDP frames of size 100 bytes) over
>>> the
>>> bridge that connects to the first interface of the unikernel. Unikernel
>>> forwards frames by collecting a number of frames from input queue and
>>> running the same number of threads that write them to the output
>>> interface.
>>>
>>> Two trace files are uploaded to the repo. The first file is the output of
>>> this configuration. This trace shows that each netif write locks until
>>> the
>>> thread that writes on the front-end connection to the ring is returned
>>> (function write_already_locked.)
>>>
>>> For the second trace, the return of the thread is ignored (commenting out
>>> "lwt () = th in" in write_already_locked). This considerably increases
>>> switching speed, but after some running time, it looks that after garbage
>>> collection, similar problem happens.
>>>
>>> Thomas and Anil, any idea from given traces, and how it is possible to
>>> make
>>> the traces more informative?
>>>
>>> Thanks.
>>>
>>>
>>> On 28/11/14 16:55, Thomas Leonard wrote:
>>>>
>>>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 28 Nov 2014, at 16:03, Masoud Koleini
>>>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Thanks Anil.
>>>>>>
>>>>>>> - graph the ring utilisation to see if it's always full (Thomas
>>>>>>> Leonard's profiling patches should help here)
>>>>>>
>>>>>> Would you please point me out to the profiling patches?
>>>>>
>>>>> See:
>>>>>
>>>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>>>>
>>>> The installation instructions here are for the previous version
>>>> (though they should still work). If you want to try the latest
>>>> version, the current Git mirage allows you to pass a ~tracing argument
>>>> to "register" in your config.ml, e.g.
>>>>
>>>> let tracing = mprof_trace ~size:1000000 () in
>>>> register "myunikernel" ~tracing [
>>>>     main $ ...
>>>> ]
>>>>
>>>> This uses a newer version of the profiling API. You should generally
>>>> "opam pin" the #tracing2 branches rather than #tracing to use it.
>>>>
>>>> Note also that it doesn't currently record ring utilisation, so you'll
>>>> still need to do some work to get that. You could use the
>>>> MProf.Counter interface, in which case the GUI will display it as a
>>>> graph over the trace.
>>>>
>>>>>>> - try to reduce the parallelisation to see if some condition there
>>>>>>> alleviates the issue to track it down.
>>>>>>
>>>>>> Reducing the maximum number of threads running in parallel reduced CPU
>>>>>> utilization, and vm was functioning for a much longer time, but the
>>>>>> same
>>>>>> problem occurred at the end.
>>>>>>
>>>>>> It might be more useful looking at the code. Please have a look at the
>>>>>> function "f_thread" in the file uploaded on the following repo:
>>>>>>
>>>>>> https://github.com/koleini/parallelisation
>>>>>
>>>>> That's a lot of code to try and distill down a test case.  Try to cut
>>>>> it
>>>>> down significantly by building a minimal Ethernet traffic generator
>>>>> that
>>>>> outputs frames with a predictable pattern in the frame, and a receiver
>>>>> that
>>>>> will check that the pattern is received as expected.
>>>>>
>>>>> Then you can try out your parallel algorithm variations on the simple
>>>>> Ethernet sender/receiver and narrow down the problem without all the
>>>>> other
>>>>> concerns.
>>>>>
>>>>> Once the bug is tracked down, we can add the sender/receiver into
>>>>> mirage-skeleton and use it as a test case to ensure that this
>>>>> functional
>>>>> never regresses in the future.  Line rate Ethernet transmission has
>>>>> worked
>>>>> in the past, but we never added a test case to ensure it stays working.
>>>>>
>>>>> Anil
>>>>> _______________________________________________
>>>>> MirageOS-devel mailing list
>>>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> This message and any attachment are intended solely for the addressee and
>>> may contain confidential information. If you have received this message
>>> in
>>> error, please send it back to me, and immediately delete it.   Please do
>>> not
>>> use, copy or disclose the information contained in this message or in any
>>> attachment.  Any views or opinions expressed by the author of this email
>>> do
>>> not necessarily reflect the views of the University of Nottingham.
>>>
>>> This message has been checked for viruses but the contents of an
>>> attachment
>>> may still contain software viruses which could damage your computer
>>> system,
>>> you are advised to perform your own checks. Email communications with the
>>> University of Nottingham may be monitored as permitted by UK legislation.
>>>
>>
>>
>
>
>
>
>
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it.   Please do not
> use, copy or disclose the information contained in this message or in any
> attachment.  Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.