[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Questions from potential new MirageOS Ocaml user

To: mirageos-devel@xxxxxxxxxxxxxxxxxxxx, Luther Flippen <ldflippen@xxxxxxxx>
From: Richard Mortier <richard.mortier@xxxxxxxxxxxx>
Date: Tue, 1 May 2018 14:21:01 +0100
Delivery-date: Tue, 01 May 2018 13:21:06 +0000
List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

Hi Luther,

To speak directly to your concerns/questions:

1. Multicore has been lacking but is actively being worked on by a teamincluding folk here in Cambridge who have a thing you can try out basedoff 4.06.1 -- see https://github.com/ocamllabs/multicore-opam, and alsohttps://github.com/ocamllabs/ocaml-multicore/wiki for more info.

2. I've no direct experience with them myself, but there was a recentmessage to the OCaml weekly news list<http://lists.idyll.org/listinfo/caml-news-weekly/> that obtained theanswer:


```
** UnixJunkie then replied:

"OCaml cannot support multicore".

Sorry, but unless you are under windows, look for the followinglibraries in opam:

parmap, parany, ocamlnet, etc.
I run parallel OCaml programs everyday.
Parmap is a good start.
There are even more such libraries that I did not mention.
```
and
```

Some people (like this person <https://github.com/examachine>) are usingthe OCaml

MPI bindings in production.

If you really want to create salable distributed applications, I wouldadvize

using the zmq OCaml bindings available in opam (zmq, it would force you to
program in an agent-based style, a la Erlang).
I have done it once in the past, and it was pretty fun to write:
<https://github.com/UnixJunkie/daft>
```
...so there is some support out there.

3+4. There is a new numerical processing library for OCaml called Owlthat you may be interested in. Among the relevant/interesting features:- supports OpenCL (though not CUDA currently), links against openBLAS,LAPACKE etc;- has a pure OCaml core which can be compiled to Javascript (I have astudent working on that at the moment) allowing models to be executed inthe browser (the pure core also makes it suitable for targeting to Mirage);- has a distribution layer (Actor) that uses OCaml functors to allowmachine learning code to be converted to distributed machine learningcode with only a couple of lines of modification;- has a model sharing component that makes it straightforward topublish, download and compose models built by others.

Docs at https://owlbarn.github.io/ -- still early days and under activedevelopment, so contributions, issues, feature requests all welcome!

Relating to scientific computation specifically, the following commenton the OCaml weekly news list also seems relevant:

```

> Maybe you may give me a good piece advise related to both OCaml anddata science.

Jupyter (<http://jupyter.org/>) is very useful and it can execute OCamlcode:<https://akabe.github.io/ocaml-jupyter/>. A Docker image containing manypackages

for data science is available:

<https://github.com/akabe/docker-ocaml-jupyter-datascience>, and someexamples are

at
<https://github.com/akabe/docker-ocaml-jupyter-datascience/tree/master/notebooks>.
Please try them, if you are interested.
```

5. There are certainly good books for OCaml out there; we have nothingreally for MirageOS I'm afraid. http://docs.mirage.io/ is the closest toa reference set of docs; the tooling for generating it may well support(eg) PDF generation either already or soon, but someone more closelyinvolved in that would need to comment. There were publishers whoexpressed interest in a MirageOS book a couple of years ago, but thetime didn't seem right and no-one seems to have had time to produce one.


Hope that's of some use!

On 14/04/2018 04:20, Luther Flippen wrote:

Dear MirageOS developers,
I DID NOT KNOW WHAT FORUM TO POST THIS ON SO I AM SENDING IT TO YOUFIRST. SUGGESTIONS AND ANSWERS ARE WELCOME.
I have questions regarding my tentative choice of language/ecosystem,but felt the need to frame this in a rather detailed context. Iapologize ahead of time for this verboseness. It is actually long enoughto be a blog in fact.
MY BACKGROUND:
As a retired engineering research worker and former professor, I want tocontinue research on my own but I need a (free) programming languageecosystem to work in. I used to use Matlab, Mathematica, Fortran, C attimes, etc. when implementing the algorithm-development aspects of mywork, and have done engineering and scientific computing since the punchcard days. I have already decided on a strongly typed functionallanguage from the ML family after a LOT of research and reading, soplease keep this in mind. I AM very much interested in deployment, notjust in using my software myself, and this includes the cloud forexample. I have not as yet learned/tried any of the following (or anyother functional) programming language. At my age I want to take thetime to get it right the first time before investing heavily in a newlanguage/ecosystem.
SOME FURTHER CONTEXT:
Currently we are seeing the evolution from traditional-OS-centric tocloud/hypervisor/container-OS/unikernel-centric computing well under wayand pervasive. The client/web side of things are now dominated by ARMdevices/phones as well. This paradigm shift over the last two decadeshas been dramatic. As the next step, I believe the current cloud willgradually evolve more from centralized large clusters of servers tomassively geographically-distributed micro-cloud/fog systems and edgecomputing, complimenting the growing IoT trend. A few years back (2013)some papers on the Beowulf and on the Iridis-Pi high performanceclusters, on the Bolzano Raspberry Pi Cloud, and on the GlascowRaspberry Pi Cloud demonstrated/prototyped micro-cloud/fog andHP-cluster systems based on the Raspberry Pi version as of that time.These boards had no hardware support for vitualization and had too-small(256/512 MB) memory for Xen, as well as comparatively slow (100Mbps/2.0)ethernet/USB for inter-pi communication. Now however, as an example atthe time of this post, one can get (from Pine64 for $45) the rPi formfactor ROCK64 board with a Rockchip RK3328 Quad-Core ARM Cortex A5364-Bit (up to 1.5 GHz) Processor, 4GB 1600MHz LPDDR3 memory, trueGigabit Ethernet, USB 3.0, eMMC module socket and microSD card slot forpersistent storage/booting, and ARM Mali-450MP2 Dual-core GPU (4K60Pcapable). The Cortex 53 is an ARMv8-A with Virtualization Extensionshardware support, and with the 4GB RAM memory, make this currenthardware capable of fully running the Xen hypervisor commonly in use onthe cloud for example. The Gigabit ethernet makes inter-board andexternal communication fast. The only downside to this particularboard/chip is that its Mali-450 gpu does not support OpenCL. Otherwise,this board can hence potentially function as the building blocks of areal micro-cloud/fog system, or a high performance distributed-clustersystem. It should be noted however, that there are other ARM Mali GPUsthat do support OpenCL, and hence using such GPUs for GPGPU programmingas well as the ARM CPU cores is supported. The hardware has caught up.
Ignoring the above in my decision would be folly.


*****

THE OTHER (SECONDARY) CANDIDATE LANGUAGES:
along with my MAIN objections to them (there may be more, not listed),given ordering unimportant:
(NOTE: The JVM was more of a negative to me than the positive it may befor many enterprise-type programmers, so please no suggestions regardinglanguages from that family. This includes Scala. Also, on a differentnote, my understanding is that Erlang (and its derivatives) is meantmainly for IO-bound concurrency, as opposed to CPU-bound parallelism, soI did not look into it very far - correct me if I am wrong. I would likeBOTH capabilities. I have also looked at Julia and others too, and donot care to debate how I narrowed my list to those of this post andre-hashing those issues.)
1) Haskell: lazy by default, but I want strict, with opt-in laziness,not laziness by default. I understand laziness allows the separation ofdata producers from data consumers, with its subsequent modularity, andthe convenience of not having to keep track of computation order, etc.but I believe this is best offered on an opt-in basis, as in Clojure forexample. Haskell is clearly the best lazy by default language in myview, if one wants that. Please do not try to change my mind on lazy bydefault, wasting effort and time. Also, it is not always clear which ofHaskell's multiple libraries, when built for the same or similarpurpose, is best-practice to use.
2) Standard ML: less actively used outside of academic circles thanothers, as to my understanding, and how active/large is its ecosystem?It does have MLton though.
3) F*: maturity and related issues bother me, but the Ocaml and F#ecosystems are apparently available to the extent that it targets those.The refinement/dependent type system is its most attractive feature.Will it survive/grow beyond its current niche use though?
4) Idris: gaining in maturity, but still a small ecosystem (and novisible use in production?). It can leverage its target language'secosystems perhaps, and it may be the best dependent typed language forpractical regular (non-proof) programming. However, how long will it bemaintained (as it is a research project): will the main author/developermove on one day for other research? This is even pointed out by itsdeveloper as a reason to not use it for production. I have partiallyread its associated book and found it very nice.
5) F#: is strongly tied to Microsoft so it has compromises/constraintsfrom being in the CLI family of languages and .NET framework. Inaddition, Microsoft now treats it and .NET as second (third?) classcitizens apparently. This latter point was even made by a strongproponent of F# (see Jon Harrop's comments at
https://www.quora.com/Why-did-the-ML-style-languages-SML-OCaml-F-fail-to-gain-any-traction)
6) ATS: There seems to be very few people using it, so the user basewould be small regarding asking for help/advice, and hence there wouldalso be fewer examples to go by too. This is compounded by a big longlearning curve, to my understanding, in which such things are needed. Ihave similar concerns regarding its ecosystem being small. This is all ashame since it may be a promising dependent-typed language (with lineartypes too) for scientific use otherwise.
*****

CURRENT MAIN CONTENDER:

OCAML

before the questions/concerns,
SOME PLUSES FIRST:
1) MirageOS (this is very big in my view, pushing Ocaml over the edge asthe leader): This looks to be a development platform for the future: forcloud, ARM, IoT, and micro-cloud/fog/HP-cluster computing in addition toUnix/Linux/ Windows and containers on PC/servers/VMs, etc. One(MirageOS) development environment => wide open type-safe (and compilerwhole-appliance/unikernel-optimized) deployment options.
2) Fast single-thread/processor/core performance, at least for an MLlanguage
3) Though it is not multicore, there are still many distributed parallelcomputing options: JoCaml, CIEL, Opis, Functory, BSML (and related),async parallel from Janestreet, MPI, Parmat, SKLML, Ocamlnet, and ofcourse forking in Unix,... and perhaps more I missed.
4) F* targets Ocaml, for refinement/dependent type capability, so Ocamlcan leverage this to gain those capabilities.
5) Coq (dependently typed) extracts to Ocaml (for verification/proofs,etc), so this capability of Coq can be leveraged by Ocaml whereneeded/desired as well.
6) SPOC library for GPGPU programming at a high, type-safe level usingOcaml (compatible with both CUDA and OpenCL)
7) Ocsigen libraries

8) Janestreet libraries

9) Its excellent module system of course

....and more


*****

FINALLY,
CONCERNS/QUESTIONS:
1) THIS IS A WELL-KNOWN/WELL-DISCUSSED CRITICISM. Ocaml has a GIL (noshared memory multicore) and no type classes. Apparently there has beena promise of multicore development in progress that has been a long timecoming to fruition, and that has bred skepticism in at least some. Doesmulticore (versus parallel-distributed) become less relevant as thenumber of cores per node grows? My understanding is, and I have heard itargued that, multicore behavior approaches distributed behavior withincreasing core number. Is this true? Modular implicits are supposed tobe on the way too (which are supposed to be better than type classes?),but how far off are they for mature production-quality use?
2) This, I think, is a big missing piece for MirageOS itself (especiallyin light of Ocaml's GIL and the "no-forking" nature of unikernels): If Iuse MirageOS as my development platform for Ocaml, which is what I wouldprefer, what is the parallel distributed computing implementation thatMirageOS will use? (Note that this is different from Jitsu producing aswarm of application copies for IO demand-response, from what Iunderstand of it.) I read where they will base it on the join calculus,but there seems to be some question as to what is beyond thatspecifically. I read JoCaml, CIEL, and Opis mentioned as possibilities.I assume this will involve communicating unikernels, spread overmultiple cores when on a single machine, and possibly scaling up tomultiple machines beyond that, for a given particular applicationrunning in parallel. As a scientific programmer I might often wantparallel computing capability for any given application I deploy.Obviously I will be limited to course grain parallelism mostly, thisenvironment being distributed.
3) This is another potentially big missing piece from MirageOS: What arethe GPGPU programming capabilities available in the MirageOS ecosystem?Is the SPOC library available and usable through MirageOS on hypervisorson which MirageOS can run? If so can it be utilized in conjunction withthe distributed computing solution discussed in the previous questionabove at the same time? In other words, can a unikernel in a pool ofparallel-distributed communicating unikernels access GPGPU programmingresources (via SPOC, clMAGMA, or some alternative) on the node on whichit resides?
4) To what extent can the above two capabilities if present, GPGPU andparallel-distributed programming, mitigate the lack of multicorecapability, especially regarding MirageOS? In this context, what are myoptions for fast unboxed linear algebra computations, especially runningon MirageOS? Would Lapack routines be viable (which are usuallyfortran/C/C++)? Obviously shared in-place memory manipulation of unboxedarrays is very efficient in this context, as in multicore, but can newGPGPU capability compensate? For example, the OpenCL BLAS and clMAGMAlibraries for OpenCL, and the cuBLAS and NVBLAS libraries for CUDA, cometo mind, or something similarly able to do linear algebra on the GPU.More generally, has the Xen/MirageOS community looked into support forthe scientific computing community? (I do not mean that they wouldnecessarily need/want to compete with the more-niche sub-community ofprofessional HPC for speed.) I am not just talking about Big Data inputand then visualization/exploring/manipulating the data by the way. Somemight want to run large simulations (physical, biological, etc) of somesort for example.
5) Not a deal-breaker, but MirageOS needs a real user guide/manual,preferably available in pdf, or better yet a book, but not just blogs asthey mostly have now. I would like more documentation beyondconcept-introduction papers too, emphasizing developer-needed knowledgein actually using MirageOS day to day. Correct me if I am wrong bypointing out such references. Ultimately a book that also addressed theabove 4 issues as well would be fantastic.
If the above 5 have good answers forthcoming, at least in the works orin the near future, then that future could look very attractive andbright for Ocaml/MirageOS, at least to me, and probably to many.
Sincerely,
Luther Flippen


/Sent From My Sprint Phone./


_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel



--
Richard Mortier
richard.mortier@xxxxxxxxxxxx

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/mirageos-devel

Next by Date: Re: [MirageOS-devel] Questions from potential new MirageOS Ocaml user
Next by thread: Re: [MirageOS-devel] Questions from potential new MirageOS Ocaml user
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.