[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Questions from potential new MirageOS Ocaml user



Hi Luther,

To speak directly to your concerns/questions:

1. Multicore has been lacking but is actively being worked on by a team including folk here in Cambridge who have a thing you can try out based off 4.06.1 -- see https://github.com/ocamllabs/multicore-opam, and also https://github.com/ocamllabs/ocaml-multicore/wiki for more info.
2. I've no direct experience with them myself, but there was a recent 
message to the OCaml weekly news list 
<http://lists.idyll.org/listinfo/caml-news-weekly/> that obtained the 
answer:
```
** UnixJunkie then replied:

"OCaml cannot support multicore".
Sorry, but unless you are under windows, look for the following libraries in opam:
parmap, parany, ocamlnet, etc.
I run parallel OCaml programs everyday.
Parmap is a good start.
There are even more such libraries that I did not mention.
```
and
```
Some people (like this person <https://github.com/examachine>) are using the OCaml
MPI bindings in production.
If you really want to create salable distributed applications, I would advize
using the zmq OCaml bindings available in opam (zmq, it would force you to
program in an agent-based style, a la Erlang).
I have done it once in the past, and it was pretty fun to write:
<https://github.com/UnixJunkie/daft>
```
...so there is some support out there.

3+4. There is a new numerical processing library for OCaml called Owl that you may be interested in. Among the relevant/interesting features: - supports OpenCL (though not CUDA currently), links against openBLAS, LAPACKE etc; - has a pure OCaml core which can be compiled to Javascript (I have a student working on that at the moment) allowing models to be executed in the browser (the pure core also makes it suitable for targeting to Mirage); - has a distribution layer (Actor) that uses OCaml functors to allow machine learning code to be converted to distributed machine learning code with only a couple of lines of modification; - has a model sharing component that makes it straightforward to publish, download and compose models built by others.
Docs at https://owlbarn.github.io/ -- still early days and under active 
development, so contributions, issues, feature requests all welcome!
Relating to scientific computation specifically, the following comment 
on the OCaml weekly news list also seems relevant:
```
> Maybe you may give me a good piece advise related to both OCaml and data science.
Jupyter (<http://jupyter.org/>) is very useful and it can execute OCaml 
code:
<https://akabe.github.io/ocaml-jupyter/>. A Docker image containing many 
packages
for data science is available:
<https://github.com/akabe/docker-ocaml-jupyter-datascience>, and some examples are
at
<https://github.com/akabe/docker-ocaml-jupyter-datascience/tree/master/notebooks>.
Please try them, if you are interested.
```

5. There are certainly good books for OCaml out there; we have nothing really for MirageOS I'm afraid. http://docs.mirage.io/ is the closest to a reference set of docs; the tooling for generating it may well support (eg) PDF generation either already or soon, but someone more closely involved in that would need to comment. There were publishers who expressed interest in a MirageOS book a couple of years ago, but the time didn't seem right and no-one seems to have had time to produce one.
Hope that's of some use!

On 14/04/2018 04:20, Luther Flippen wrote:
Dear MirageOS developers,

I DID NOT KNOW WHAT FORUM TO POST THIS ON SO I AM SENDING IT TO YOU FIRST. SUGGESTIONS AND ANSWERS ARE WELCOME.
I have questions regarding my tentative choice of language/ecosystem, 
but felt the need to frame this in a rather detailed context. I 
apologize ahead of time for this verboseness. It is actually long enough 
to be a blog in fact.
MY BACKGROUND:

As a retired engineering research worker and former professor, I want to continue research on my own but I need a (free) programming language ecosystem to work in. I used to use Matlab, Mathematica, Fortran, C at times, etc. when implementing the algorithm-development aspects of my work, and have done engineering and scientific computing since the punch card days.  I have already decided on a strongly typed functional language from the ML family after a LOT of research and reading, so please keep this in mind. I AM very much interested in deployment, not just in using my software myself, and this includes the cloud for example. I have not as yet learned/tried any of the following (or any other functional) programming language. At my age I want to take the time to get it right the first time before investing heavily in a new language/ecosystem.
SOME FURTHER CONTEXT:

Currently we are seeing the evolution from traditional-OS-centric to cloud/hypervisor/container-OS/unikernel-centric computing well under way and pervasive. The client/web side of things are now dominated by ARM devices/phones as well. This paradigm shift over the last two decades has been dramatic. As the next step, I believe the current cloud will gradually evolve more from centralized large clusters of servers to massively geographically-distributed micro-cloud/fog systems and edge computing, complimenting the growing IoT trend. A few years back (2013) some papers on the Beowulf and on the Iridis-Pi high performance clusters, on the Bolzano Raspberry Pi Cloud, and on the Glascow Raspberry Pi Cloud demonstrated/prototyped micro-cloud/fog and HP-cluster systems based on the Raspberry Pi version as of that time. These boards had no hardware support for vitualization and had too-small (256/512 MB) memory for Xen, as well as comparatively slow (100Mbps/2.0) ethernet/USB for inter-pi communication. Now however, as an example at the time of this post, one can get (from Pine64 for $45) the rPi form factor ROCK64 board with a Rockchip RK3328 Quad-Core ARM Cortex A53 64-Bit (up to 1.5 GHz) Processor, 4GB 1600MHz LPDDR3 memory, true Gigabit Ethernet, USB 3.0, eMMC module socket and microSD card slot for persistent storage/booting, and ARM Mali-450MP2 Dual-core GPU (4K60P capable). The Cortex 53 is an ARMv8-A with Virtualization Extensions hardware support, and with the 4GB RAM memory, make this current hardware capable of fully running the Xen hypervisor commonly in use on the cloud for example. The Gigabit ethernet makes inter-board and external communication fast. The only downside to this particular board/chip is that its Mali-450 gpu does not support OpenCL. Otherwise, this board can hence potentially function as the building blocks of a real micro-cloud/fog system, or a high performance distributed-cluster system. It should be noted however, that there are other ARM Mali GPUs that do support OpenCL, and hence using such GPUs for GPGPU programming as well as the ARM CPU cores is supported. The hardware has caught up.
Ignoring the above in my decision would be folly.


*****

THE OTHER (SECONDARY) CANDIDATE LANGUAGES:

along with my MAIN objections to them (there may be more, not listed), given ordering unimportant:
(NOTE: The JVM was more of a negative to me than the positive it may be 
for many enterprise-type programmers, so please no suggestions regarding 
languages from that family. This includes Scala. Also, on a different 
note, my understanding is that Erlang (and its derivatives) is meant 
mainly for IO-bound concurrency, as opposed to CPU-bound parallelism, so 
I did not look into it very far - correct me if I am wrong. I would like 
BOTH capabilities. I have also looked at Julia and others too, and do 
not care to debate how I narrowed my list to those of this post and 
re-hashing those issues.)
1) Haskell: lazy by default, but I want strict, with opt-in laziness, 
not laziness by default. I understand laziness allows the separation of 
data producers from data consumers, with its subsequent modularity, and 
the convenience of not having to keep track of computation order, etc. 
but I believe this is best offered on an opt-in basis, as in Clojure for 
example. Haskell is clearly the best lazy by default language in my 
view, if one wants that. Please do not try to change my mind on lazy by 
default, wasting effort and time. Also, it is not always clear which of 
Haskell's multiple libraries, when built for the same or similar 
purpose, is best-practice to use.
2) Standard ML: less actively used outside of academic circles than 
others, as to my understanding, and how active/large is its ecosystem? 
It does have MLton though.
3) F*: maturity and related issues bother me, but the Ocaml and F# 
ecosystems are apparently available to the extent that it targets those. 
The refinement/dependent type system is its most attractive feature. 
Will it survive/grow beyond its current niche use though?
4) Idris: gaining in maturity, but still a small ecosystem (and no 
visible use in production?). It can leverage its target language's 
ecosystems perhaps, and it may be the best dependent typed language for 
practical regular (non-proof) programming. However, how long will it be 
maintained (as it is a research project): will the main author/developer 
move on one day for other research? This is even pointed out by its 
developer as a reason to not use it for production. I have partially 
read its associated book and found it very nice.
5) F#: is strongly tied to Microsoft so it has compromises/constraints 
from being in the CLI family of languages and .NET framework. In 
addition, Microsoft now treats it and .NET as second (third?) class 
citizens apparently. This latter point was even made by a strong 
proponent of F# (see Jon Harrop's comments at
https://www.quora.com/Why-did-the-ML-style-languages-SML-OCaml-F-fail-to-gain-any-traction)

6) ATS: There seems to be very few people using it, so the user base would be small regarding asking for help/advice, and hence there would also be fewer examples to go by too. This is compounded by a big long learning curve, to my understanding, in which such things are needed. I have similar concerns regarding its ecosystem being small. This is all a shame since it may be a promising dependent-typed language (with linear types too) for scientific use otherwise.

*****

CURRENT MAIN CONTENDER:

OCAML

before the questions/concerns,
SOME PLUSES FIRST:

1) MirageOS (this is very big in my view, pushing Ocaml over the edge as the leader): This looks to be a development platform for the future: for cloud, ARM, IoT, and micro-cloud/fog/HP-cluster computing in addition to Unix/Linux/ Windows and containers on PC/servers/VMs, etc. One (MirageOS) development environment => wide open type-safe (and compiler whole-appliance/unikernel-optimized) deployment options.
2) Fast single-thread/processor/core performance, at least for an ML 
language
3) Though it is not multicore, there are still many distributed parallel 
computing options: JoCaml, CIEL, Opis, Functory, BSML (and related), 
async parallel from Janestreet, MPI, Parmat, SKLML, Ocamlnet, and of 
course forking in Unix,... and perhaps more I missed.
4) F* targets Ocaml, for refinement/dependent type capability, so Ocaml 
can leverage this to gain those capabilities.
5) Coq (dependently typed) extracts to Ocaml (for verification/proofs, 
etc), so this capability of Coq can be leveraged by Ocaml where 
needed/desired as well.
6) SPOC library for GPGPU programming at a high, type-safe level using 
Ocaml (compatible with both CUDA and OpenCL)
7) Ocsigen libraries

8) Janestreet libraries

9) Its excellent module system of course

....and more


*****

FINALLY,
CONCERNS/QUESTIONS:

1) THIS IS A WELL-KNOWN/WELL-DISCUSSED CRITICISM. Ocaml has a GIL (no shared memory multicore) and no type classes. Apparently there has been a promise of multicore development in progress that has been a long time coming to fruition, and that has bred skepticism in at least some. Does multicore (versus parallel-distributed) become less relevant as the number of cores per node grows? My understanding is, and I have heard it argued that, multicore behavior approaches distributed behavior with increasing core number. Is this true? Modular implicits are supposed to be on the way too (which are supposed to be better than type classes?), but how far off are they for mature production-quality use?
2) This, I think, is a big missing piece for MirageOS itself (especially 
in light of Ocaml's GIL and the "no-forking" nature of unikernels): If I 
use MirageOS as my development platform for Ocaml, which is what I would 
prefer, what is the parallel distributed computing implementation that 
MirageOS will use? (Note that this is different from Jitsu producing a 
swarm of application copies for IO demand-response, from what I 
understand of it.) I read where they will base it on the join calculus, 
but there seems to be some question as to what is beyond that 
specifically. I read JoCaml, CIEL, and Opis mentioned as possibilities. 
I assume this will involve communicating unikernels, spread over 
multiple cores when on a single machine, and possibly scaling up to 
multiple machines beyond that, for a given particular application 
running in parallel. As a scientific programmer I might often want 
parallel computing capability for any given application I deploy. 
Obviously I will be limited to course grain parallelism mostly, this 
environment being distributed.
3) This is another potentially big missing piece from MirageOS: What are 
the GPGPU programming capabilities available in the MirageOS ecosystem? 
Is the SPOC library available and usable through MirageOS on hypervisors 
on which MirageOS can run? If so can it be utilized in conjunction with 
the distributed computing solution discussed in the previous question 
above at the same time? In other words, can a unikernel in a pool of 
parallel-distributed communicating unikernels access GPGPU programming 
resources (via SPOC, clMAGMA, or some alternative) on the node on which 
it resides?
4) To what extent can the above two capabilities if present, GPGPU and 
parallel-distributed programming, mitigate the lack of multicore 
capability, especially regarding MirageOS? In this context, what are my 
options for fast unboxed linear algebra computations, especially running 
on MirageOS? Would Lapack routines be viable (which are usually 
fortran/C/C++)? Obviously shared in-place memory manipulation of unboxed 
arrays is very efficient in this context, as in multicore, but can new 
GPGPU capability compensate? For example, the OpenCL BLAS and clMAGMA 
libraries for OpenCL, and the cuBLAS and NVBLAS libraries for CUDA, come 
to mind, or something similarly able to do linear algebra on the GPU. 
More generally, has the Xen/MirageOS community looked into support for 
the scientific computing community? (I do not mean that they would 
necessarily need/want to compete with the more-niche sub-community of 
professional HPC for speed.) I am not just talking about Big Data input 
and then visualization/exploring/manipulating the data by the way. Some 
might want to run large simulations (physical, biological, etc) of some 
sort for example.
5) Not a deal-breaker, but MirageOS needs a real user guide/manual, 
preferably available in pdf, or better yet a book, but not just blogs as 
they mostly have now. I would like more documentation beyond 
concept-introduction papers too, emphasizing developer-needed knowledge 
in actually using MirageOS day to day. Correct me if I am wrong by 
pointing out such references. Ultimately a book that also addressed the 
above 4 issues as well would be fantastic.
If the above 5 have good answers forthcoming, at least in the works or 
in the near future, then that future could look very attractive and 
bright for Ocaml/MirageOS, at least to me, and probably to many.
Sincerely,
Luther Flippen


/Sent From My Sprint Phone./


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel


--
Richard Mortier
richard.mortier@xxxxxxxxxxxx

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.