[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Memory exhaustion in Mirage



This is an excellent point -- applying backpressure going back to dom0 would help balance out load once multiple netfronts come into play.

However, the issue with the ring handler being Lwt-aware is that it's not clear what to do if the function blocks.  All subsequent acks would also be blocked since the ring has to be processed in order (although the responses on the ring themselves may be out of order).

I think we should take a look at the Ring API itself once Dave's refactoring has landed and is released.  A more concurrent API could push the backpressure to the layer immediately after the Ring ack.

-anil

On 20 Dec 2014, at 09:06, Haris Rotsos <cr409@xxxxxxxxxxxx> wrote:

I am following up to share some experience from when I was experimenting with the original mirage switch. I also have a few design suggestions, but maybe they are invalid. 

I think the problem mentioned in the original mail stems from the way packets are handled by a xen unikernel.  if you check https://github.com/mirage/mirage-net-xen/blob/master/lib/netif.ml on line 300, for each new packet arriving on a VIF, the code will start a thread and ignore the result. As a result, if you start sending lots of packets to a unikernel and the main processing pipeline cannot keep up with the rate, then new packets are allocated and delegated to an handling thread, but these threads will never return and they will create a huge backlog, which at some point will exhaust memory. 

From a design point of view the best approach to solve this problem, I think, is to create a pool of threads with a fixed size. If the pool doesn't have a free thread, then the packet is dropped. Alternatively, the driver can be redesigned to create backpressure to the netback and force packets to drop in the Dom0, instead of the unikernel, and thus reduce a bit the cpu load. In the past I tried to rewrite the rx_poll method, But the problem is that the function that is passed   as a handler to ack new pages from the netback is not lwt aware, thus you cannot easily integrate some of the lwt asyncronicity in the processing pipeline.

On 20 December 2014 at 00:11, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
The actual definition of these functions is probably in the mirage/mirage-platform repository at the moment.  Look in the xen/ subdirectory and grep for the C file that contains it.

We can migrate this library to use xen_cflags and avoid the need for this repetition quite soon as well.  The build hacks are gradually being undone!

-anil

> On 19 Dec 2014, at 09:25, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote:
>
> CCing the list.
>
>> On 19 Dec 2014, at 08:49, Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>
>> I have added the function to stub_alloc_pages.c and defined it as external function in io_page.ml. No problem in installing io_page, but when compiling the program, I receive the error:
>>
>> undefined reference to `caml_dealloc_page'
>>
>> how do you think I can fix it?
>>
>> Thanks.
>>
>> On 18/12/14 13:46, Anil Madhavapeddy wrote:
>>> Not at the moment, although we could easily add an explicit deallocator into
>>> Io_page.  You need to very careful that the page is not used after its lifetime,
>>> so more pool-based abstractions are preferred (where the pages are recycled into
>>> an OCaml-managed data structure and reused rather than GCed).
>>>
>>> You should be able to pump up the amount of RAM the VM gets temporarily to 4GB
>>> or so -- if the page allocator still remains low, then it's unlikely to be a GC
>>> issue and an actual leak somewhere due to holding onto references and keeping
>>> the page live.
>>>
>>> -anil
>>>
>>>> On 18 Dec 2014, at 05:41, Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> Is it possible to deallocate an Io_page after packet is send, not to wait for GC?
>>>>
>>>> On 16/12/14 23:45, Anil Madhavapeddy wrote:
>>>>> On 16 Dec 2014, at 15:23, Masoud Koleini <masoud.koleini@xxxxxxxxxxxxxxxx> wrote:
>>>>>> I am wondering if the following issue is already addressed:
>>>>>>
>>>>>> https://github.com/mirage/mirage-tcpip/issues/33
>>>>>>
>>>>>> https://lists.cam.ac.uk/pipermail/cl-mirage/2013-August/msg00104.html
>>>>>>
>>>>> It probably still happens at 64MB of RAM -- it requires some code to adjust
>>>>> the GC parameters in the OCaml runtime to trigger a collection more often.
>>>>> It should be harmless however, since (as the bug report observes), a failure
>>>>> to allocate an Io_page results in a GC compaction that frees up memory so
>>>>> that the allocation eventually succeeds.
>>>>>
>>>>> Any particular reason for asking -- is the bug affecting your switch somehow?
>>>>>
>>>>> -anil
>>>>
>>>>
>>>>
>>>>
>>>> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>>>>
>>>> This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
>>>>
>>
>>
>>
>>
>>
>> This message and any attachment are intended solely for the addressee
>> and may contain confidential information. If you have received this
>> message in error, please send it back to me, and immediately delete it.
>> Please do not use, copy or disclose the information contained in this
>> message or in any attachment.  Any views or opinions expressed by the
>> author of this email do not necessarily reflect the views of the
>> University of Nottingham.
>>
>> This message has been checked for viruses but the contents of an
>> attachment may still contain software viruses which could damage your
>> computer system, you are advised to perform your own checks. Email
>> communications with the University of Nottingham may be monitored as
>> permitted by UK legislation.
>>
>
>
> _______________________________________________
> MirageOS-devel mailing list
> MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
>


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel


--
Charalampos Rotsos
PhD student
The University of Cambridge
Computer Laboratory
William Gates Building
JJ Thomson Avenue
Cambridge
CB3 0FD

Phone: +44-(0) 1223 767032
Email: cr409@xxxxxxxxxxxx

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.