[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Gc.compact weirdness help



HI all,

got a problem with the Gc memory managment in ocaml and I was
wondering if someone could give me some help.

So currently I am working on integrating ns3 network simulator
(http://www.nsnam.org/) as a network backend for mirage. The idea is
to convert mirage into a network simulator for software defined
network designs. I have so far managed to get packets and timing
integration and all works nicely, but I got a memory managment
problem.

My problem is that when Gc start to compact memory, my program goes on
an halt and nothing moves.

SO far I have a simple program which creates two hosts, one wotking as
a tcp server and the other working as a tcp client. At time t=1 sec
the client sets up a tcp connection and starts sending packets as fast
as it can.

>From the network perspective the way I hanfle packet passing is as follows:

for each interface I register a packet handler which pushed packet to
the OS.Netif module as follows:

bool
PktDemux(Ptr<NetDevice> dev, Ptr<const Packet> pkt, uint16_t proto,
    const Address &src, const Address &dst, NetDevice::PacketType type) {
  CAMLlocal1( ml_data );
  printf("packet demux...\n");
  fprintf(stdout, "%f: receiving %u packet done...\n",
      (long)Simulator::Now().GetMicroSeconds() / 1e6, pkt->GetSize());
  fflush(stdout);

  int pkt_len = pkt->GetSize();
  ml_data = caml_alloc_string(pkt_len);
  pkt->CopyData((uint8_t *)String_val(ml_data), pkt_len);

  // find host name
  string node_name = getHostName(dev);

  //printf("node %s.%d packet\n", node_name.c_str(), dev->GetIfIndex());
  // call packet handling code in caml
  caml_callback3(*caml_named_value("demux_pkt"),
      caml_copy_string((const char *)node_name.c_str()),
      Val_int(dev->GetIfIndex()), ml_data );
  printf("packet demux end...\n");
  return true;
}

for packet transmission I have the following code:

CAMLprim value
caml_pkt_write(value v_node_name, value v_id, value v_ba,
    value v_off, value v_len) {
  CAMLparam5(v_node_name, v_id, v_ba, v_off, v_len);
  printf("sending packet begin...\n");

  uint32_t ifIx = (uint32_t)Int_val(v_id);
  string node_name = string(String_val(v_node_name));

  //get a pointer to the packet byte data
  uint8_t *buf = (uint8_t *) Caml_ba_data_val(v_ba);
  int len = Int_val(v_len), off = Int_val(v_off);
  Ptr< Packet> pkt = Create<Packet>(buf + off, len );

  // rther proto of the packet.
  uint16_t proto = ntohs(*(uint16_t *)(buf + off + 12));

  // find the right device for the node and send packet
  Ptr<Node> node = nodes[node_name];

  Mac48Address mac_dst;
  mac_dst.CopyFrom(buf+off);
  for (uint32_t i = 0; i < node->GetNDevices (); i++)
    if(node->GetDevice(i)->GetIfIndex() == ifIx) {
      if(!node->GetDevice(i)->Send(pkt, mac_dst, proto))
        fprintf(stdout, "%f: packet dropped...\n",
            (long)Simulator::Now().GetMicroSeconds() / 1e6);
      fprintf(stdout, "%f: (left pkt %u) sending %u packet done...\n",
          (long)Simulator::Now().GetMicroSeconds() / 1e6,
          
node->GetDevice(i)->GetObject<CsmaNetDevice>()->GetQueue()->GetNPackets(),
          pkt->GetSize());
      fflush(stdout);
    }
  printf("sending packet end...\n");
  CAMLreturn( Val_unit );
}

On the ocaml side of things the netif listem uses an Lwt_stream to
read for packets and blocks when no packets are queued for processing.
The pkt_demux method calls an ocaml method that will simply convert
the strimg into a Cstruct and push it down to the Lwt_stream. I am
thinking that maybe this is where I am over using memory, but the
memory overutilisation should account only for a few packets and the
program shouldn't be running out of memory so fast.

For packet transmission I also have a method that will check the size
of the queue of the network device. If the queue is full, the netif
thrread will block on an Lwt condition and setup a timer event which
will check every milisecond if the queue has space for packets. If the
device transmits a packet, then the timer handler will call an ocaml
method that will broadcast on the condition a unit in order to unblock
the sending thread.

Now from during run I get the following problem.

cr409@nile ~/scratch/mirage/regress> gdb ./_build/ns3-direct/basic/sleep_ns3.bin
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done.
(gdb) run
Starting program:
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin
[Thread debugging using libthread_db enabled]
Main: startup
OS.Topology started...
Adding node node1
Adding node node2
Adding link between nodes node1 - node2
Netif: plug node1.0
Netif: plug node2.0
Initialising nodes....
Manager: create
Manager: plug 0
Manager: plug done, to listener
Listening Server
Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.1
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Listening Server
Manager: init done
Manager: create
Manager: plug 0
Manager: plug done, to listener
0.000000: trying to connect client
Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.2
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Manager: init done
packet demux...
0.001000: receiving 46 packet done...
ARP: updating 10.0.0.1 -> 00:00:00:00:00:01
packet demux end...
packet demux...
0.002342: receiving 46 packet done...
ARP: updating 10.0.0.2 -> 00:00:00:00:00:02
packet demux end...
event handler begin...
1.000000: trying to connect client
sending packet begin...
1.000000: (left pkt 0) sending 80 packet done...
sending packet end...
event handler end...
packet demux...
1.001000: receiving 62 packet done...
............................................
1.139615: receiving 54 packet done...
sending packet begin...
1.139615: (left pkt 36) sending 1532 packet done...
sending packet end...
sending packet begin...
1.139615: (left pkt 37) sending 1532 packet done...
sending packet end...
sending packet begin...
1.139615: (left pkt 38) sending 1532 packet done...
sending packet end...
1.139615: Writing new buffer....
1.139615: Writing new buffer....
1.139615: Writing new buffer....
packet demux end...
packet demux...
1.140655: receiving 1514 packet done...
1.140655: read 1460
^Z
Program received signal SIGTSTP, Stopped (user).
0x0000000000493e58 in caml_oldify_local_roots ()
Missing separate debuginfos, use: debuginfo-install
atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64
expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64
freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64
glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64
gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64
libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64
libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64
libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64
libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64
libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64
libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64
libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64
libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64
ncurses-libs-5.7-3.20090208.el6.x86_64
pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64
sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000000000493e58 in caml_oldify_local_roots ()
#1  0x0000000000496525 in caml_empty_minor_heap ()
#2  0x000000000049665a in caml_minor_collection ()
#3  0x0000000000494a32 in caml_garbage_collection ()
#4  0x00000000004a40d6 in caml_call_gc ()
#5  0x00000000000000ff in ?? ()
#6  0x2525252525252525 in ?? ()
#7  0x0000000000000000 in ?? ()

I got a note from anil that he default ulimit of the program might be
my problem. I have so far:

cr409@nile ~/scratch/mirage/regress> ulimit -s
1024


Another interesting point is that if I call the Gc.compact method
every time I get a packet on the demux method then the program halts
with a couple of packets.

cr409@nile ~/scratch/mirage/regress> gdb ./_build/ns3-direct/basic/sleep_ns3.bin
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done.
(gdb) run
Starting program:
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin
[Thread debugging using libthread_db enabled]
Main: startup
OS.Topology started...
Adding node node1
Adding node node2
Adding link between nodes node1 - node2
Netif: plug node1.0
Netif: plug node2.0
Initialising nodes....
Manager: create
Manager: plug 0
Manager: plug done, to listener
Listening Server
Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.1
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Listening Server
Manager: init done
Manager: create
Manager: plug 0
Manager: plug done, to listener
0.000000: trying to connect client
Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.2
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Manager: init done
packet demux...
0.001000: receiving 46 packet done...
ARP: updating 10.0.0.1 -> 00:00cr409@nile ~/scratch/mirage/regress>
gdb ./_build/ns3-direct/basic/sleep_ns3.bin
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done.
(gdb) run
Starting program:
/local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin
[Thread debugging using libthread_db enabled]
Main: startup
OS.Topology started...
Adding node node1
Adding node node2
Adding link between nodes node1 - node2
Netif: plug node1.0
Netif: plug node2.0
Initialising nodes....
Manager: create
Manager: plug 0
Manager: plug done, to listener
Listening Server
Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.1
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Listening Server
Manager: init done
Manager: create
Manager: plug 0
Manager: plug done, to listener
0.000000: trying to connect client
Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1]
ARP: sending gratuitous from 10.0.0.2
sending packet begin...
0.000000: (left pkt 0) sending 64 packet done...
sending packet end...
Manager: init done
packet demux...
0.001000: receiving 46 packet done...
ARP: updating 10.0.0.1 -> 00:00:00:00:00:01
packet demux end...
packet demux...
0.002342: receiving 46 packet done...
^Z
Program received signal SIGTSTP, Stopped (user).
0x0000000000493e5c in caml_oldify_local_roots ()
Missing separate debuginfos, use: debuginfo-install
atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64
expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64
freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64
glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64
gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64
libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64
libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64
libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64
libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64
libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64
libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64
libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64
libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64
ncurses-libs-5.7-3.20090208.el6.x86_64
pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64
sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000000000493e5c in caml_oldify_local_roots ()
#1  0x0000000000496585 in caml_empty_minor_heap ()
#2  0x000000000049f750 in caml_gc_compaction ()
#3  0x00000000004a42b8 in caml_c_call ()
#4  0x00007ffff7fcafd0 in ?? ()
#5  0x000000000045bf81 in camlLwt__backtrace_catch_1667 ()
#6  0x00007fffffffd2c0 in ?? ()
#7  0x000000000045bf63 in camlLwt__backtrace_catch_1667 ()
#8  0x00007fffffffd300 in ?? ()
#9  0x00007ffff2ef3904 in mcount () from /lib64/libc.so.6
#10 0x00000000006e2080 in camlOS__64 ()
#11 0x00007ffff7fcafb0 in ?? ()
#12 0x0000000000000000 in ?? ()
:00:00:00:01
packet demux end...
packet demux...
0.002342: receiving 46 packet done...
^Z
Program received signal SIGTSTP, Stopped (user).
0x0000000000493e5c in caml_oldify_local_roots ()
Missing separate debuginfos, use: debuginfo-install
atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64
expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64
freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64
glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64
gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64
libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64
libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64
libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64
libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64
libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64
libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64
libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64
libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64
ncurses-libs-5.7-3.20090208.el6.x86_64
pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64
sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x0000000000493e5c in caml_oldify_local_roots ()
#1  0x0000000000496585 in caml_empty_minor_heap ()
#2  0x000000000049f750 in caml_gc_compaction ()
#3  0x00000000004a42b8 in caml_c_call ()
#4  0x00007ffff7fcafd0 in ?? ()
#5  0x000000000045bf81 in camlLwt__backtrace_catch_1667 ()
#6  0x00007fffffffd2c0 in ?? ()
#7  0x000000000045bf63 in camlLwt__backtrace_catch_1667 ()
#8  0x00007fffffffd300 in ?? ()
#9  0x00007ffff2ef3904 in mcount () from /lib64/libc.so.6
#10 0x00000000006e2080 in camlOS__64 ()
#11 0x00007ffff7fcafb0 in ?? ()
#12 0x0000000000000000 in ?? ()

any ideas how I handle this weirdness?

I am planning today as a debugging test to try and keep either the
server or the client in c++ so that I can check if the problem is
somewhere on the way I processing packets currently and try to find
out how I can optimise the packet reading or writing process.

-- 
Charalampos Rotsos
PhD student
The University of Cambridge
Computer Laboratory
William Gates Building
JJ Thomson Avenue
Cambridge
CB3 0FD

Phone: +44-(0) 1223 767032
Email: cr409@xxxxxxxxxxxx



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.