[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Parallelizing writing to network devices



On 18 December 2014 at 15:17, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>
> On 18/12/14 13:19, Thomas Leonard wrote:
>>
>> On 17 December 2014 at 18:05, Masoud Koleini
>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> Thanks Thomas for the great tracing tool!
>>>
>>> The following is a very simple unikernel with two interfaces, which
>>> redirects frames captured on the first interface to the second one:
>>>
>>> https://github.com/koleini/parallelisation
>>>
>>> The problem is that in a high packet rate (more than 80'000 pps), switch
>>> stops receiving. The goal is to spot the problem and enhance the
>>> throughput
>>> of Mirage netif.
>>>
>>> Test environment consists of another vm running a traffic generator and
>>> sending frames of a specific pattern (UDP frames of size 100 bytes) over
>>> the
>>> bridge that connects to the first interface of the unikernel. Unikernel
>>> forwards frames by collecting a number of frames from input queue and
>>> running the same number of threads that write them to the output
>>> interface.
>>>
>>> Two trace files are uploaded to the repo. The first file is the output of
>>> this configuration. This trace shows that each netif write locks until
>>> the
>>> thread that writes on the front-end connection to the ring is returned
>>> (function write_already_locked.)
>>
>> Do these traces show it after it stopped? The second has a long sleep,
>> while the first looks like it was in the middle of a run.
>>
>> If it had stopped in both cases, it suggests that the whole unikernel
>> stopped (not just the listen thread), because there are no more timer
>> interrupts and no sleep region.
>>
>> Does "xl top" show the unikernel still using the CPU? Or it is
>> waiting, or crashed?
>>
>> If you have a thread writing a string to the console once per second,
>> does it continue after unikernel stops accepting frames?
>
>  Yes, both are. It looks that I have more info on the traces with updated
> Mirage libraries. So, I updated the traces in the repo.
>
> The unikernel is still working, as traces that periodically write info on
> the console are still working too.

I'm not sure, but it might be worth applying this fix and testing again:

  https://github.com/mirage/mirage-net-xen/pull/16

(when Netif stopped to wait for space in the transmit ring, it would
sometimes fail to notice when space became available)

> With original configuration (netif unchanged), it looks that the reason is
> unikernel gets out of memory after some time, while error message is shown
> only in a few experiments. This is the main bottleneck for Mirage
> applications,  which is waiting for a packet write to terminate is time
> consuming and doesn't allow high rate packet switching for network
> applications.
>
> Modifying netif by ignoring the thread that is waiting for the result of
> writing to the ring is also problematic. So, any idea how to do bulk packet
> write on a network interface?
>
>
>>
>>> For the second trace, the return of the thread is ignored (commenting out
>>> "lwt () = th in" in write_already_locked). This considerably increases
>>> switching speed, but after some running time, it looks that after garbage
>>> collection, similar problem happens.
>>>
>>> Thomas and Anil, any idea from given traces, and how it is possible to
>>> make
>>> the traces more informative?
>>>
>>> Thanks.
>>>
>>>
>>> On 28/11/14 16:55, Thomas Leonard wrote:
>>>>
>>>> On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 28 Nov 2014, at 16:03, Masoud Koleini
>>>>>> <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Thanks Anil.
>>>>>>
>>>>>>> - graph the ring utilisation to see if it's always full (Thomas
>>>>>>> Leonard's profiling patches should help here)
>>>>>>
>>>>>> Would you please point me out to the profiling patches?
>>>>>
>>>>> See:
>>>>>
>>>>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>>>>
>>>> The installation instructions here are for the previous version
>>>> (though they should still work). If you want to try the latest
>>>> version, the current Git mirage allows you to pass a ~tracing argument
>>>> to "register" in your config.ml, e.g.
>>>>
>>>> let tracing = mprof_trace ~size:1000000 () in
>>>> register "myunikernel" ~tracing [
>>>>     main $ ...
>>>> ]
>>>>
>>>> This uses a newer version of the profiling API. You should generally
>>>> "opam pin" the #tracing2 branches rather than #tracing to use it.
>>>>
>>>> Note also that it doesn't currently record ring utilisation, so you'll
>>>> still need to do some work to get that. You could use the
>>>> MProf.Counter interface, in which case the GUI will display it as a
>>>> graph over the trace.
>>>>
>>>>>>> - try to reduce the parallelisation to see if some condition there
>>>>>>> alleviates the issue to track it down.
>>>>>>
>>>>>> Reducing the maximum number of threads running in parallel reduced CPU
>>>>>> utilization, and vm was functioning for a much longer time, but the
>>>>>> same
>>>>>> problem occurred at the end.
>>>>>>
>>>>>> It might be more useful looking at the code. Please have a look at the
>>>>>> function "f_thread" in the file uploaded on the following repo:
>>>>>>
>>>>>> https://github.com/koleini/parallelisation
>>>>>
>>>>> That's a lot of code to try and distill down a test case.  Try to cut
>>>>> it
>>>>> down significantly by building a minimal Ethernet traffic generator
>>>>> that
>>>>> outputs frames with a predictable pattern in the frame, and a receiver
>>>>> that
>>>>> will check that the pattern is received as expected.
>>>>>
>>>>> Then you can try out your parallel algorithm variations on the simple
>>>>> Ethernet sender/receiver and narrow down the problem without all the
>>>>> other
>>>>> concerns.
>>>>>
>>>>> Once the bug is tracked down, we can add the sender/receiver into
>>>>> mirage-skeleton and use it as a test case to ensure that this
>>>>> functional
>>>>> never regresses in the future.  Line rate Ethernet transmission has
>>>>> worked
>>>>> in the past, but we never added a test case to ensure it stays working.
>>>>>
>>>>> Anil
>>>>> _______________________________________________
>>>>> MirageOS-devel mailing list
>>>>> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> This message and any attachment are intended solely for the addressee and
>>> may contain confidential information. If you have received this message
>>> in
>>> error, please send it back to me, and immediately delete it.   Please do
>>> not
>>> use, copy or disclose the information contained in this message or in any
>>> attachment.  Any views or opinions expressed by the author of this email
>>> do
>>> not necessarily reflect the views of the University of Nottingham.
>>>
>>> This message has been checked for viruses but the contents of an
>>> attachment
>>> may still contain software viruses which could damage your computer
>>> system,
>>> you are advised to perform your own checks. Email communications with the
>>> University of Nottingham may be monitored as permitted by UK legislation.
>>>
>>
>>
>
>
>
>
>
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it.   Please do not
> use, copy or disclose the information contained in this message or in any
> attachment.  Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>



-- 
Dr Thomas Leonard        http://0install.net/
GPG: 9242 9807 C985 3C07 44A6  8B9A AE07 8280 59A5 3CC1
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.