[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Gc.compact weirdness help
HI all, got a problem with the Gc memory managment in ocaml and I was wondering if someone could give me some help. So currently I am working on integrating ns3 network simulator (http://www.nsnam.org/) as a network backend for mirage. The idea is to convert mirage into a network simulator for software defined network designs. I have so far managed to get packets and timing integration and all works nicely, but I got a memory managment problem. My problem is that when Gc start to compact memory, my program goes on an halt and nothing moves. SO far I have a simple program which creates two hosts, one wotking as a tcp server and the other working as a tcp client. At time t=1 sec the client sets up a tcp connection and starts sending packets as fast as it can. >From the network perspective the way I hanfle packet passing is as follows: for each interface I register a packet handler which pushed packet to the OS.Netif module as follows: bool PktDemux(Ptr<NetDevice> dev, Ptr<const Packet> pkt, uint16_t proto, const Address &src, const Address &dst, NetDevice::PacketType type) { CAMLlocal1( ml_data ); printf("packet demux...\n"); fprintf(stdout, "%f: receiving %u packet done...\n", (long)Simulator::Now().GetMicroSeconds() / 1e6, pkt->GetSize()); fflush(stdout); int pkt_len = pkt->GetSize(); ml_data = caml_alloc_string(pkt_len); pkt->CopyData((uint8_t *)String_val(ml_data), pkt_len); // find host name string node_name = getHostName(dev); //printf("node %s.%d packet\n", node_name.c_str(), dev->GetIfIndex()); // call packet handling code in caml caml_callback3(*caml_named_value("demux_pkt"), caml_copy_string((const char *)node_name.c_str()), Val_int(dev->GetIfIndex()), ml_data ); printf("packet demux end...\n"); return true; } for packet transmission I have the following code: CAMLprim value caml_pkt_write(value v_node_name, value v_id, value v_ba, value v_off, value v_len) { CAMLparam5(v_node_name, v_id, v_ba, v_off, v_len); printf("sending packet begin...\n"); uint32_t ifIx = (uint32_t)Int_val(v_id); string node_name = string(String_val(v_node_name)); //get a pointer to the packet byte data uint8_t *buf = (uint8_t *) Caml_ba_data_val(v_ba); int len = Int_val(v_len), off = Int_val(v_off); Ptr< Packet> pkt = Create<Packet>(buf + off, len ); // rther proto of the packet. uint16_t proto = ntohs(*(uint16_t *)(buf + off + 12)); // find the right device for the node and send packet Ptr<Node> node = nodes[node_name]; Mac48Address mac_dst; mac_dst.CopyFrom(buf+off); for (uint32_t i = 0; i < node->GetNDevices (); i++) if(node->GetDevice(i)->GetIfIndex() == ifIx) { if(!node->GetDevice(i)->Send(pkt, mac_dst, proto)) fprintf(stdout, "%f: packet dropped...\n", (long)Simulator::Now().GetMicroSeconds() / 1e6); fprintf(stdout, "%f: (left pkt %u) sending %u packet done...\n", (long)Simulator::Now().GetMicroSeconds() / 1e6, node->GetDevice(i)->GetObject<CsmaNetDevice>()->GetQueue()->GetNPackets(), pkt->GetSize()); fflush(stdout); } printf("sending packet end...\n"); CAMLreturn( Val_unit ); } On the ocaml side of things the netif listem uses an Lwt_stream to read for packets and blocks when no packets are queued for processing. The pkt_demux method calls an ocaml method that will simply convert the strimg into a Cstruct and push it down to the Lwt_stream. I am thinking that maybe this is where I am over using memory, but the memory overutilisation should account only for a few packets and the program shouldn't be running out of memory so fast. For packet transmission I also have a method that will check the size of the queue of the network device. If the queue is full, the netif thrread will block on an Lwt condition and setup a timer event which will check every milisecond if the queue has space for packets. If the device transmits a packet, then the timer handler will call an ocaml method that will broadcast on the condition a unit in order to unblock the sending thread. Now from during run I get the following problem. cr409@nile ~/scratch/mirage/regress> gdb ./_build/ns3-direct/basic/sleep_ns3.bin GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done. (gdb) run Starting program: /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin [Thread debugging using libthread_db enabled] Main: startup OS.Topology started... Adding node node1 Adding node node2 Adding link between nodes node1 - node2 Netif: plug node1.0 Netif: plug node2.0 Initialising nodes.... Manager: create Manager: plug 0 Manager: plug done, to listener Listening Server Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.1 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Listening Server Manager: init done Manager: create Manager: plug 0 Manager: plug done, to listener 0.000000: trying to connect client Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.2 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Manager: init done packet demux... 0.001000: receiving 46 packet done... ARP: updating 10.0.0.1 -> 00:00:00:00:00:01 packet demux end... packet demux... 0.002342: receiving 46 packet done... ARP: updating 10.0.0.2 -> 00:00:00:00:00:02 packet demux end... event handler begin... 1.000000: trying to connect client sending packet begin... 1.000000: (left pkt 0) sending 80 packet done... sending packet end... event handler end... packet demux... 1.001000: receiving 62 packet done... ............................................ 1.139615: receiving 54 packet done... sending packet begin... 1.139615: (left pkt 36) sending 1532 packet done... sending packet end... sending packet begin... 1.139615: (left pkt 37) sending 1532 packet done... sending packet end... sending packet begin... 1.139615: (left pkt 38) sending 1532 packet done... sending packet end... 1.139615: Writing new buffer.... 1.139615: Writing new buffer.... 1.139615: Writing new buffer.... packet demux end... packet demux... 1.140655: receiving 1514 packet done... 1.140655: read 1460 ^Z Program received signal SIGTSTP, Stopped (user). 0x0000000000493e58 in caml_oldify_local_roots () Missing separate debuginfos, use: debuginfo-install atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64 expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64 freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64 gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64 libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64 libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64 libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64 libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64 libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64 sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x0000000000493e58 in caml_oldify_local_roots () #1 0x0000000000496525 in caml_empty_minor_heap () #2 0x000000000049665a in caml_minor_collection () #3 0x0000000000494a32 in caml_garbage_collection () #4 0x00000000004a40d6 in caml_call_gc () #5 0x00000000000000ff in ?? () #6 0x2525252525252525 in ?? () #7 0x0000000000000000 in ?? () I got a note from anil that he default ulimit of the program might be my problem. I have so far: cr409@nile ~/scratch/mirage/regress> ulimit -s 1024 Another interesting point is that if I call the Gc.compact method every time I get a packet on the demux method then the program halts with a couple of packets. cr409@nile ~/scratch/mirage/regress> gdb ./_build/ns3-direct/basic/sleep_ns3.bin GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done. (gdb) run Starting program: /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin [Thread debugging using libthread_db enabled] Main: startup OS.Topology started... Adding node node1 Adding node node2 Adding link between nodes node1 - node2 Netif: plug node1.0 Netif: plug node2.0 Initialising nodes.... Manager: create Manager: plug 0 Manager: plug done, to listener Listening Server Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.1 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Listening Server Manager: init done Manager: create Manager: plug 0 Manager: plug done, to listener 0.000000: trying to connect client Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.2 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Manager: init done packet demux... 0.001000: receiving 46 packet done... ARP: updating 10.0.0.1 -> 00:00cr409@nile ~/scratch/mirage/regress> gdb ./_build/ns3-direct/basic/sleep_ns3.bin GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin...done. (gdb) run Starting program: /local/scratch/cr409/mirage/regress/_build/ns3-direct/basic/sleep_ns3.bin [Thread debugging using libthread_db enabled] Main: startup OS.Topology started... Adding node node1 Adding node node2 Adding link between nodes node1 - node2 Netif: plug node1.0 Netif: plug node2.0 Initialising nodes.... Manager: create Manager: plug 0 Manager: plug done, to listener Listening Server Manager: VIF 0 to 10.0.0.1 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.1 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Listening Server Manager: init done Manager: create Manager: plug 0 Manager: plug done, to listener 0.000000: trying to connect client Manager: VIF 0 to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1] ARP: sending gratuitous from 10.0.0.2 sending packet begin... 0.000000: (left pkt 0) sending 64 packet done... sending packet end... Manager: init done packet demux... 0.001000: receiving 46 packet done... ARP: updating 10.0.0.1 -> 00:00:00:00:00:01 packet demux end... packet demux... 0.002342: receiving 46 packet done... ^Z Program received signal SIGTSTP, Stopped (user). 0x0000000000493e5c in caml_oldify_local_roots () Missing separate debuginfos, use: debuginfo-install atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64 expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64 freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64 gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64 libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64 libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64 libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64 libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64 libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64 sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x0000000000493e5c in caml_oldify_local_roots () #1 0x0000000000496585 in caml_empty_minor_heap () #2 0x000000000049f750 in caml_gc_compaction () #3 0x00000000004a42b8 in caml_c_call () #4 0x00007ffff7fcafd0 in ?? () #5 0x000000000045bf81 in camlLwt__backtrace_catch_1667 () #6 0x00007fffffffd2c0 in ?? () #7 0x000000000045bf63 in camlLwt__backtrace_catch_1667 () #8 0x00007fffffffd300 in ?? () #9 0x00007ffff2ef3904 in mcount () from /lib64/libc.so.6 #10 0x00000000006e2080 in camlOS__64 () #11 0x00007ffff7fcafb0 in ?? () #12 0x0000000000000000 in ?? () :00:00:00:01 packet demux end... packet demux... 0.002342: receiving 46 packet done... ^Z Program received signal SIGTSTP, Stopped (user). 0x0000000000493e5c in caml_oldify_local_roots () Missing separate debuginfos, use: debuginfo-install atk-1.28.0-2.el6.x86_64 cairo-1.8.8-3.1.el6.x86_64 expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64 freetype-2.3.11-6.el6_2.9.x86_64 glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.47.el6_2.12.x86_64 gsl-1.13-1.el6.x86_64 gtk2-2.18.9-6.el6.centos.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libXcomposite-0.4.1-2.el6.x86_64 libXcursor-1.1.10-2.el6.x86_64 libXdamage-1.1.2-1.el6.x86_64 libXext-1.1-3.el6.x86_64 libXfixes-4.0.4-1.el6.x86_64 libXi-1.3-3.el6.x86_64 libXinerama-1.1-1.el6.x86_64 libXrandr-1.3.0-4.el6.x86_64 libXrender-0.9.5-1.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64 libselinux-2.0.94-5.2.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libxcb-1.5-1.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 pango-1.28.1-3.el6_0.5.1.centos.x86_64 pixman-0.18.4-1.el6_0.1.x86_64 sqlite-3.6.20-1.el6.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x0000000000493e5c in caml_oldify_local_roots () #1 0x0000000000496585 in caml_empty_minor_heap () #2 0x000000000049f750 in caml_gc_compaction () #3 0x00000000004a42b8 in caml_c_call () #4 0x00007ffff7fcafd0 in ?? () #5 0x000000000045bf81 in camlLwt__backtrace_catch_1667 () #6 0x00007fffffffd2c0 in ?? () #7 0x000000000045bf63 in camlLwt__backtrace_catch_1667 () #8 0x00007fffffffd300 in ?? () #9 0x00007ffff2ef3904 in mcount () from /lib64/libc.so.6 #10 0x00000000006e2080 in camlOS__64 () #11 0x00007ffff7fcafb0 in ?? () #12 0x0000000000000000 in ?? () any ideas how I handle this weirdness? I am planning today as a debugging test to try and keep either the server or the client in c++ so that I can check if the problem is somewhere on the way I processing packets currently and try to find out how I can optimise the packet reading or writing process. -- Charalampos Rotsos PhD student The University of Cambridge Computer Laboratory William Gates Building JJ Thomson Avenue Cambridge CB3 0FD Phone: +44-(0) 1223 767032 Email: cr409@xxxxxxxxxxxx
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |