[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Questions from potential new MirageOS Ocaml user



Hi Luther,

To speak directly to your concerns/questions:

1. Multicore has been lacking but is actively being worked on by a team including folk here in Cambridge who have a thing you can try out based off 4.06.1 -- see https://github.com/ocamllabs/multicore-opam, and also https://github.com/ocamllabs/ocaml-multicore/wiki for more info.

2. I've no direct experience with them myself, but there was a recent message to the OCaml weekly news list <http://lists.idyll.org/listinfo/caml-news-weekly/> that obtained the answer:

```
** UnixJunkie then replied:

"OCaml cannot support multicore".
Sorry, but unless you are under windows, look for the following libraries in opam:
parmap, parany, ocamlnet, etc.
I run parallel OCaml programs everyday.
Parmap is a good start.
There are even more such libraries that I did not mention.
```
and
```
Some people (like this person <https://github.com/examachine>) are using the OCaml
MPI bindings in production.
If you really want to create salable distributed applications, I would advize
using the zmq OCaml bindings available in opam (zmq, it would force you to
program in an agent-based style, a la Erlang).
I have done it once in the past, and it was pretty fun to write:
<https://github.com/UnixJunkie/daft>
```
...so there is some support out there.

3+4. There is a new numerical processing library for OCaml called Owl that you may be interested in. Among the relevant/interesting features: - supports OpenCL (though not CUDA currently), links against openBLAS, LAPACKE etc; - has a pure OCaml core which can be compiled to Javascript (I have a student working on that at the moment) allowing models to be executed in the browser (the pure core also makes it suitable for targeting to Mirage); - has a distribution layer (Actor) that uses OCaml functors to allow machine learning code to be converted to distributed machine learning code with only a couple of lines of modification; - has a model sharing component that makes it straightforward to publish, download and compose models built by others.

Docs at https://owlbarn.github.io/ -- still early days and under active development, so contributions, issues, feature requests all welcome!

Relating to scientific computation specifically, the following comment on the OCaml weekly news list also seems relevant:

```
> Maybe you may give me a good piece advise related to both OCaml and data science.

Jupyter (<http://jupyter.org/>) is very useful and it can execute OCaml code: <https://akabe.github.io/ocaml-jupyter/>. A Docker image containing many packages
for data science is available:
<https://github.com/akabe/docker-ocaml-jupyter-datascience>, and some examples are
at
<https://github.com/akabe/docker-ocaml-jupyter-datascience/tree/master/notebooks>.
Please try them, if you are interested.
```

5. There are certainly good books for OCaml out there; we have nothing really for MirageOS I'm afraid. http://docs.mirage.io/ is the closest to a reference set of docs; the tooling for generating it may well support (eg) PDF generation either already or soon, but someone more closely involved in that would need to comment. There were publishers who expressed interest in a MirageOS book a couple of years ago, but the time didn't seem right and no-one seems to have had time to produce one.

Hope that's of some use!

On 14/04/2018 04:20, Luther Flippen wrote:

Dear MirageOS developers,

I DID NOT KNOW WHAT FORUM TO POST THIS ON SO I AM SENDING IT TO YOU FIRST. SUGGESTIONS AND ANSWERS ARE WELCOME.

I have questions regarding my tentative choice of language/ecosystem, but felt the need to frame this in a rather detailed context. I apologize ahead of time for this verboseness. It is actually long enough to be a blog in fact.

MY BACKGROUND:

As a retired engineering research worker and former professor, I want to continue research on my own but I need a (free) programming language ecosystem to work in. I used to use Matlab, Mathematica, Fortran, C at times, etc. when implementing the algorithm-development aspects of my work, and have done engineering and scientific computing since the punch card days.  I have already decided on a strongly typed functional language from the ML family after a LOT of research and reading, so please keep this in mind. I AM very much interested in deployment, not just in using my software myself, and this includes the cloud for example. I have not as yet learned/tried any of the following (or any other functional) programming language. At my age I want to take the time to get it right the first time before investing heavily in a new language/ecosystem.

SOME FURTHER CONTEXT:

Currently we are seeing the evolution from traditional-OS-centric to cloud/hypervisor/container-OS/unikernel-centric computing well under way and pervasive. The client/web side of things are now dominated by ARM devices/phones as well. This paradigm shift over the last two decades has been dramatic. As the next step, I believe the current cloud will gradually evolve more from centralized large clusters of servers to massively geographically-distributed micro-cloud/fog systems and edge computing, complimenting the growing IoT trend. A few years back (2013) some papers on the Beowulf and on the Iridis-Pi high performance clusters, on the Bolzano Raspberry Pi Cloud, and on the Glascow Raspberry Pi Cloud demonstrated/prototyped micro-cloud/fog and HP-cluster systems based on the Raspberry Pi version as of that time. These boards had no hardware support for vitualization and had too-small (256/512 MB) memory for Xen, as well as comparatively slow (100Mbps/2.0) ethernet/USB for inter-pi communication. Now however, as an example at the time of this post, one can get (from Pine64 for $45) the rPi form factor ROCK64 board with a Rockchip RK3328 Quad-Core ARM Cortex A53 64-Bit (up to 1.5 GHz) Processor, 4GB 1600MHz LPDDR3 memory, true Gigabit Ethernet, USB 3.0, eMMC module socket and microSD card slot for persistent storage/booting, and ARM Mali-450MP2 Dual-core GPU (4K60P capable). The Cortex 53 is an ARMv8-A with Virtualization Extensions hardware support, and with the 4GB RAM memory, make this current hardware capable of fully running the Xen hypervisor commonly in use on the cloud for example. The Gigabit ethernet makes inter-board and external communication fast. The only downside to this particular board/chip is that its Mali-450 gpu does not support OpenCL. Otherwise, this board can hence potentially function as the building blocks of a real micro-cloud/fog system, or a high performance distributed-cluster system. It should be noted however, that there are other ARM Mali GPUs that do support OpenCL, and hence using such GPUs for GPGPU programming as well as the ARM CPU cores is supported. The hardware has caught up.

Ignoring the above in my decision would be folly.


*****

THE OTHER (SECONDARY) CANDIDATE LANGUAGES:

along with my MAIN objections to them (there may be more, not listed), given ordering unimportant:

(NOTE: The JVM was more of a negative to me than the positive it may be for many enterprise-type programmers, so please no suggestions regarding languages from that family. This includes Scala. Also, on a different note, my understanding is that Erlang (and its derivatives) is meant mainly for IO-bound concurrency, as opposed to CPU-bound parallelism, so I did not look into it very far - correct me if I am wrong. I would like BOTH capabilities. I have also looked at Julia and others too, and do not care to debate how I narrowed my list to those of this post and re-hashing those issues.)

1) Haskell: lazy by default, but I want strict, with opt-in laziness, not laziness by default. I understand laziness allows the separation of data producers from data consumers, with its subsequent modularity, and the convenience of not having to keep track of computation order, etc. but I believe this is best offered on an opt-in basis, as in Clojure for example. Haskell is clearly the best lazy by default language in my view, if one wants that. Please do not try to change my mind on lazy by default, wasting effort and time. Also, it is not always clear which of Haskell's multiple libraries, when built for the same or similar purpose, is best-practice to use.

2) Standard ML: less actively used outside of academic circles than others, as to my understanding, and how active/large is its ecosystem? It does have MLton though.

3) F*: maturity and related issues bother me, but the Ocaml and F# ecosystems are apparently available to the extent that it targets those. The refinement/dependent type system is its most attractive feature. Will it survive/grow beyond its current niche use though?

4) Idris: gaining in maturity, but still a small ecosystem (and no visible use in production?). It can leverage its target language's ecosystems perhaps, and it may be the best dependent typed language for practical regular (non-proof) programming. However, how long will it be maintained (as it is a research project): will the main author/developer move on one day for other research? This is even pointed out by its developer as a reason to not use it for production. I have partially read its associated book and found it very nice.

5) F#: is strongly tied to Microsoft so it has compromises/constraints from being in the CLI family of languages and .NET framework. In addition, Microsoft now treats it and .NET as second (third?) class citizens apparently. This latter point was even made by a strong proponent of F# (see Jon Harrop's comments at
https://www.quora.com/Why-did-the-ML-style-languages-SML-OCaml-F-fail-to-gain-any-traction)

6) ATS: There seems to be very few people using it, so the user base would be small regarding asking for help/advice, and hence there would also be fewer examples to go by too. This is compounded by a big long learning curve, to my understanding, in which such things are needed. I have similar concerns regarding its ecosystem being small. This is all a shame since it may be a promising dependent-typed language (with linear types too) for scientific use otherwise.


*****

CURRENT MAIN CONTENDER:

OCAML

before the questions/concerns,
SOME PLUSES FIRST:

1) MirageOS (this is very big in my view, pushing Ocaml over the edge as the leader): This looks to be a development platform for the future: for cloud, ARM, IoT, and micro-cloud/fog/HP-cluster computing in addition to Unix/Linux/ Windows and containers on PC/servers/VMs, etc. One (MirageOS) development environment => wide open type-safe (and compiler whole-appliance/unikernel-optimized) deployment options.

2) Fast single-thread/processor/core performance, at least for an ML language

3) Though it is not multicore, there are still many distributed parallel computing options: JoCaml, CIEL, Opis, Functory, BSML (and related), async parallel from Janestreet, MPI, Parmat, SKLML, Ocamlnet, and of course forking in Unix,... and perhaps more I missed.

4) F* targets Ocaml, for refinement/dependent type capability, so Ocaml can leverage this to gain those capabilities.

5) Coq (dependently typed) extracts to Ocaml (for verification/proofs, etc), so this capability of Coq can be leveraged by Ocaml where needed/desired as well.

6) SPOC library for GPGPU programming at a high, type-safe level using Ocaml (compatible with both CUDA and OpenCL)

7) Ocsigen libraries

8) Janestreet libraries

9) Its excellent module system of course

....and more


*****

FINALLY,
CONCERNS/QUESTIONS:

1) THIS IS A WELL-KNOWN/WELL-DISCUSSED CRITICISM. Ocaml has a GIL (no shared memory multicore) and no type classes. Apparently there has been a promise of multicore development in progress that has been a long time coming to fruition, and that has bred skepticism in at least some. Does multicore (versus parallel-distributed) become less relevant as the number of cores per node grows? My understanding is, and I have heard it argued that, multicore behavior approaches distributed behavior with increasing core number. Is this true? Modular implicits are supposed to be on the way too (which are supposed to be better than type classes?), but how far off are they for mature production-quality use?

2) This, I think, is a big missing piece for MirageOS itself (especially in light of Ocaml's GIL and the "no-forking" nature of unikernels): If I use MirageOS as my development platform for Ocaml, which is what I would prefer, what is the parallel distributed computing implementation that MirageOS will use? (Note that this is different from Jitsu producing a swarm of application copies for IO demand-response, from what I understand of it.) I read where they will base it on the join calculus, but there seems to be some question as to what is beyond that specifically. I read JoCaml, CIEL, and Opis mentioned as possibilities. I assume this will involve communicating unikernels, spread over multiple cores when on a single machine, and possibly scaling up to multiple machines beyond that, for a given particular application running in parallel. As a scientific programmer I might often want parallel computing capability for any given application I deploy. Obviously I will be limited to course grain parallelism mostly, this environment being distributed.

3) This is another potentially big missing piece from MirageOS: What are the GPGPU programming capabilities available in the MirageOS ecosystem? Is the SPOC library available and usable through MirageOS on hypervisors on which MirageOS can run? If so can it be utilized in conjunction with the distributed computing solution discussed in the previous question above at the same time? In other words, can a unikernel in a pool of parallel-distributed communicating unikernels access GPGPU programming resources (via SPOC, clMAGMA, or some alternative) on the node on which it resides?

4) To what extent can the above two capabilities if present, GPGPU and parallel-distributed programming, mitigate the lack of multicore capability, especially regarding MirageOS? In this context, what are my options for fast unboxed linear algebra computations, especially running on MirageOS? Would Lapack routines be viable (which are usually fortran/C/C++)? Obviously shared in-place memory manipulation of unboxed arrays is very efficient in this context, as in multicore, but can new GPGPU capability compensate? For example, the OpenCL BLAS and clMAGMA libraries for OpenCL, and the cuBLAS and NVBLAS libraries for CUDA, come to mind, or something similarly able to do linear algebra on the GPU. More generally, has the Xen/MirageOS community looked into support for the scientific computing community? (I do not mean that they would necessarily need/want to compete with the more-niche sub-community of professional HPC for speed.) I am not just talking about Big Data input and then visualization/exploring/manipulating the data by the way. Some might want to run large simulations (physical, biological, etc) of some sort for example.

5) Not a deal-breaker, but MirageOS needs a real user guide/manual, preferably available in pdf, or better yet a book, but not just blogs as they mostly have now. I would like more documentation beyond concept-introduction papers too, emphasizing developer-needed knowledge in actually using MirageOS day to day. Correct me if I am wrong by pointing out such references. Ultimately a book that also addressed the above 4 issues as well would be fantastic.

If the above 5 have good answers forthcoming, at least in the works or in the near future, then that future could look very attractive and bright for Ocaml/MirageOS, at least to me, and probably to many.

Sincerely,
Luther Flippen


/Sent From My Sprint Phone./


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel



--
Richard Mortier
richard.mortier@xxxxxxxxxxxx

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.