[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Parallelizing writing to network devices

To: Thomas Leonard <talex5@xxxxxxxxx>
From: Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx>
Date: Thu, 18 Dec 2014 15:17:42 +0000
Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>, Anil Madhavapeddy <anil@xxxxxxxxxx>
Delivery-date: Thu, 18 Dec 2014 15:17:56 +0000
List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>


On 18/12/14 13:19, Thomas Leonard wrote:

On 17 December 2014 at 18:05, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:

Thanks Thomas for the great tracing tool!

The following is a very simple unikernel with two interfaces, which
redirects frames captured on the first interface to the second one:

https://github.com/koleini/parallelisation

The problem is that in a high packet rate (more than 80'000 pps), switch
stops receiving. The goal is to spot the problem and enhance the throughput
of Mirage netif.

Test environment consists of another vm running a traffic generator and
sending frames of a specific pattern (UDP frames of size 100 bytes) over the
bridge that connects to the first interface of the unikernel. Unikernel
forwards frames by collecting a number of frames from input queue and
running the same number of threads that write them to the output interface.

Two trace files are uploaded to the repo. The first file is the output of
this configuration. This trace shows that each netif write locks until the
thread that writes on the front-end connection to the ring is returned
(function write_already_locked.)

Do these traces show it after it stopped? The second has a long sleep,
while the first looks like it was in the middle of a run.

If it had stopped in both cases, it suggests that the whole unikernel
stopped (not just the listen thread), because there are no more timer
interrupts and no sleep region.

Does "xl top" show the unikernel still using the CPU? Or it is
waiting, or crashed?

If you have a thread writing a string to the console once per second,
does it continue after unikernel stops accepting frames?

Yes, both are. It looks that I have more info on the traces withupdated Mirage libraries. So, I updated the traces in the repo.

The unikernel is still working, as traces that periodically write infoon the console are still working too.

With original configuration (netif unchanged), it looks that the reasonis unikernel gets out of memory after some time, while error message isshown only in a few experiments. This is the main bottleneck for Mirageapplications, which is waiting for a packet write to terminate is timeconsuming and doesn't allow high rate packet switching for networkapplications.

Modifying netif by ignoring the thread that is waiting for the result ofwriting to the ring is also problematic. So, any idea how to do bulkpacket write on a network interface?

For the second trace, the return of the thread is ignored (commenting out
"lwt () = th in" in write_already_locked). This considerably increases
switching speed, but after some running time, it looks that after garbage
collection, similar problem happens.

Thomas and Anil, any idea from given traces, and how it is possible to make
the traces more informative?

Thanks.


On 28/11/14 16:55, Thomas Leonard wrote:

On 28 November 2014 at 16:24, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:

On 28 Nov 2014, at 16:03, Masoud Koleini
<masoud.koleini@xxxxxxxxxxxxxxxx> wrote:

Thanks Anil.

- graph the ring utilisation to see if it's always full (Thomas
Leonard's profiling patches should help here)

Would you please point me out to the profiling patches?

See:
http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/

The installation instructions here are for the previous version
(though they should still work). If you want to try the latest
version, the current Git mirage allows you to pass a ~tracing argument
to "register" in your config.ml, e.g.

let tracing = mprof_trace ~size:1000000 () in
register "myunikernel" ~tracing [
    main $ ...
]

This uses a newer version of the profiling API. You should generally
"opam pin" the #tracing2 branches rather than #tracing to use it.

Note also that it doesn't currently record ring utilisation, so you'll
still need to do some work to get that. You could use the
MProf.Counter interface, in which case the GUI will display it as a
graph over the trace.

- try to reduce the parallelisation to see if some condition there
alleviates the issue to track it down.

Reducing the maximum number of threads running in parallel reduced CPU
utilization, and vm was functioning for a much longer time, but the same
problem occurred at the end.

It might be more useful looking at the code. Please have a look at the
function "f_thread" in the file uploaded on the following repo:

https://github.com/koleini/parallelisation

That's a lot of code to try and distill down a test case.  Try to cut it
down significantly by building a minimal Ethernet traffic generator that
outputs frames with a predictable pattern in the frame, and a receiver that
will check that the pattern is received as expected.

Then you can try out your parallel algorithm variations on the simple
Ethernet sender/receiver and narrow down the problem without all the other
concerns.

Once the bug is tracked down, we can add the sender/receiver into
mirage-skeleton and use it as a test case to ensure that this functional
never regresses in the future.  Line rate Ethernet transmission has worked
in the past, but we never added a test case to ensure it stays working.

Anil
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel





This message and any attachment are intended solely for the addressee and
may contain confidential information. If you have received this message in
error, please send it back to me, and immediately delete it.   Please do not
use, copy or disclose the information contained in this message or in any
attachment.  Any views or opinions expressed by the author of this email do
not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system,
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.






This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may 
still contain software viruses which could damage your computer system, you are 
advised to perform your own checks. Email communications with the University of 
Nottingham may be monitored as permitted by UK legislation.


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

References:
- Re: [MirageOS-devel] Parallelizing writing to network devices
  - From: Masoud Koleini
- Re: [MirageOS-devel] Parallelizing writing to network devices
  - From: Thomas Leonard

Prev by Date: Re: [MirageOS-devel] Memory exhaustion in Mirage
Next by Date: Re: [MirageOS-devel] Memory exhaustion in Mirage
Previous by thread: Re: [MirageOS-devel] Parallelizing writing to network devices
Next by thread: [MirageOS-devel] Xen C stubs / profiling / TLS / status
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.