[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Parallelizing writing to network devices




On 18/12/14 11:31, David Scott wrote:


On Thu, Dec 18, 2014 at 11:13 AM, Thomas Leonard <talex5@xxxxxxxxx> wrote:
On 18 December 2014 at 11:01, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>
>> On 17 December 2014 at 18:05, Masoud
>> Koleini<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> Thanks Thomas for the great tracing tool!
>>>
>>> The following is a very simple unikernel with two interfaces, which
>>> redirects frames captured on the first interface to the second one:
>>>
>>> https://github.com/koleini/parallelisation
>>>
>>> The problem is that in a high packet rate (more than 80'000 pps), switch
>>> stops receiving. The goal is to spot the problem and enhance the
>>> throughput
>>> of Mirage netif.
>>
>> I don't know if this is the problem, but in the code, I see you do:
>>
>>Â Â listen if1 if2
>>Â Â >> (forward_thread if2)
>>
>> This ignores the result from listen, so if the listen thread later
>> fails then the error will be discarded. I'd try something like this:
>>
>>Â Â Lwt.choose [
>>Â Â Â listen if1 if2;
>>Â Â Â forward_thread if2
>>Â Â ]
>>
>> (and lose the "return" at the end of listen)
>
>
> listen calls netif listen function, which only sets receive_callback on the
> interface (used by poll_thread). It looks that failing poll_thread is not
> monitored in netif code.

Hmm. You're right. Does it show anything if you catch errors there?

Looks like it was changed here:

https://github.com/mirage/mirage-net-xen/commit/5d9df74fb0d8a3eecda8beac9e694f939392273c

Perhaps listen could just return the poll thread, rather than making a
fake task. Dave?

Ah yes, good idea. If that callback fails then we're going to fail silently.

Dave

I have modified listen to return poll thread. Poll threads are not failing when switch stops working.

Â

>> I really think the >> operator should be banned...
>>
>>> Test environment consists of another vm running a traffic generator and
>>> sending frames of a specific pattern (UDP frames of size 100 bytes) over
>>> the
>>> bridge that connects to the first interface of the unikernel. Unikernel
>>> forwards frames by collecting a number of frames from input queue and
>>> running the same number of threads that write them to the output
>>> interface.
>>>
>>> Two trace files are uploaded to the repo. The first file is the output of
>>> this configuration. This trace shows that each netif write locks until
>>> the
>>> thread that writes on the front-end connection to the ring is returned
>>> (function write_already_locked.)
>>>
>>> For the second trace, the return of the thread is ignored (commenting out
>>> "lwt () = th in" in write_already_locked). This considerably increases
>>> switching speed, but after some running time, it looks that after garbage
>>> collection, similar problem happens.
>>>
>>> Thomas and Anil, any idea from given traces, and how it is possible to
>>> make
>>> the traces more informative?
>>>
>>> Thanks.
>>>
>>>
>>> On 28/11/14 16:55, Thomas Leonard wrote:
>>>>
>>>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 28 Nov 2014, at 16:03, Masoud Koleini
>>>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Thanks Anil.
>>>>>>
>>>>>>> - graph the ring utilisation to see if it's always full (Thomas
>>>>>>> Leonard's profiling patches should help here)
>>>>>>
>>>>>> Would you please point me out to the profiling patches?
>>>>>
>>>>> See:
>>>>>
>>>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>>>>
>>>> The installation instructions here are for the previous version
>>>> (though they should still work). If you want to try the latest
>>>> version, the current Git mirage allows you to pass a ~tracing argument
>>>> to "register" in your config.ml, e.g.
>>>>
>>>> let tracing = mprof_trace ~size:1000000 () in
>>>> register "myunikernel" ~tracing [
>>>>Â Â Âmain $ ...
>>>> ]
>>>>
>>>> This uses a newer version of the profiling API. You should generally
>>>> "opam pin" the #tracing2 branches rather than #tracing to use it.
>>>>
>>>> Note also that it doesn't currently record ring utilisation, so you'll
>>>> still need to do some work to get that. You could use the
>>>> MProf.Counter interface, in which case the GUI will display it as a
>>>> graph over the trace.
>>>>
>>>>>>> - try to reduce the parallelisation to see if some condition there
>>>>>>> alleviates the issue to track it down.
>>>>>>
>>>>>> Reducing the maximum number of threads running in parallel reduced CPU
>>>>>> utilization, and vm was functioning for a much longer time, but the
>>>>>> same
>>>>>> problem occurred at the end.
>>>>>>
>>>>>> It might be more useful looking at the code. Please have a look at the
>>>>>> function "f_thread" in the file uploaded on the following repo:
>>>>>>
>>>>>> https://github.com/koleini/parallelisation
>>>>>
>>>>> That's a lot of code to try and distill down a test case. Try to cut
>>>>> it
>>>>> down significantly by building a minimal Ethernet traffic generator
>>>>> that
>>>>> outputs frames with a predictable pattern in the frame, and a receiver
>>>>> that
>>>>> will check that the pattern is received as expected.
>>>>>
>>>>> Then you can try out your parallel algorithm variations on the simple
>>>>> Ethernet sender/receiver and narrow down the problem without all the
>>>>> other
>>>>> concerns.
>>>>>
>>>>> Once the bug is tracked down, we can add the sender/receiver into
>>>>> mirage-skeleton and use it as a test case to ensure that this
>>>>> functional
>>>>> never regresses in the future. Line rate Ethernet transmission has
>>>>> worked
>>>>> in the past, but we never added a test case to ensure it stays working.
>>>>>
>>>>> Anil
>>>>> _______________________________________________
>>>>> MirageOS-devel mailing list
>>>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> This message and any attachment are intended solely for the addressee and
>>> may contain confidential information. If you have received this message
>>> in
>>> error, please send it back to me, and immediately delete it. ÂPlease do
>>> not
>>> use, copy or disclose the information contained in this message or in any
>>> attachment. Any views or opinions expressed by the author of this email
>>> do
>>> not necessarily reflect the views of the University of Nottingham.
>>>
>>> This message has been checked for viruses but the contents of an
>>> attachment
>>> may still contain software viruses which could damage your computer
>>> system,
>>> you are advised to perform your own checks. Email communications with the
>>> University of Nottingham may be monitored as permitted by UK legislation.
>>>
>>
>>
>
>
>
>
>
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it. ÂPlease do not
> use, copy or disclose the information contained in this message or in any
> attachment. Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>



--
Dr Thomas Leonard    http://0install.net/
GPG: 9242 9807 C985 3C07 44A6Â 8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDAÂ BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


--
Dave Scott



This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.