Datseris / whyjulia-manifesto

Why Julia - A Manifesto.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why Julia - a Manifesto.

DOI

The purpose of this repository is to be an accessible and well-formulated summary of the reasons academia should be using the Julia programming language for scientific computing / computational sciences / data analysis / research software engineering. Feel free to share it with interested parties that ask "Why Julia?".

(clearly, this post is expressing the views of the author(s); nevertheless, these views are based on factual information provided explicitly in the later sections)

  1. What's a programming language?
  2. What's most important in a programming language?
  3. Julia is the best language (for science)
    1. Speed of writing
    2. Speed of execution
    3. Available libraries
    4. Extensibility/Composability
    5. Accessibility/Shareability/Reproducibility
  4. Detailed advantages of Julia
  5. But why do you try so hard to convince people?

What's a programming language?

A programming language is a tool for the scientist to get their job done, while allowing their work to be further re-used and extended upon by other scientists.

What's most important in a programming language?

Keeping it simple and with academia in mind, the most fundamentally important aspects of a programming language are:

  1. Speed of writing. How quick it is for the scientist to get their ideas from a piece of paper/their brain into a runnable prototype on a computer.
  2. Speed of execution. How quick the written code is run by the computer.
  3. Available libraries. How many good packages/extensions are there that provide relevant functionality out-of-the-box, and whether they are well documented as well.
  4. Extensibility/Composability. How easy it is to re-use, or extend, or compose with, an existing library, even in a different programming language.
  5. Accessibility/Shareability/Reproducibility. How easy it is to get started with the language (installations and learning the first steps), and how easy it is to share the work with other scientists so that they effortlessly reproduce the shared work.

Now we may ask ourselves: "What is the ideal programming language?"

It is the one that is the best in all of the aforementioned fundamental aspects. This language does not exist as pros-and-cons are an integral part of the real world. Therefore, the best real language is the one that overall accumulates the highest quality when explored across these fundamental aspects.

Julia is the best language (for science)

When compared over these fundamental aspects, Julia emerges as the overall winner because it performs exceptionally well in all of them. More details and references are given in the "detailed" section. The summary is:

Speed of writing

Julia is a dynamic language allowing interactive and flexible code creation and exploration. The syntax of Julia is as close as possible to math, and also modern (high level). It also includes innovative progress such as automatically understanding when statements end or allowing usage of Unicode. These advances make it so that often the Julia code is a 1-to-1 mapping to the scientific paper it was implemented from. Julia also does not require explicitly type-annotating anything (although it is possible to do so), further increasing the efficiency of writing code.

Multiple dispatch is a programming paradigm that is most similar to scientific thought, because it detaches processes from data types. This accelerates the process of getting the scientific ideas into runnable code.

Speed of execution

Julia compiles to machine code that is routinely as fast as C/FORTRAN due to its intelligent type inference system. That is, standard user Julia code gets compiled to efficient machine code - no need for arbitrary language extensions (e.g., Numba, Cython, ...). Julia has native support for parallel and distributed computing. These aspects are also easy to use. Julia has strong GPU support via packages which are also easy to use. The Julia package ecosystem has even more packages for code performance acceleration such as Transducers.jl, ThreadsX.jl, FLoops.jl, MultiThreadedCaches.jl, ParallelAccelerator.jl, Dagger.jl, and more.

Available libraries

Julia has software organizations for seemingly every scientific area. From high energy physics to economics, it is likely that there is a developer community around the field you work on already, with native Julia software that can match up to alternatives in other languages. Moreover, these organizations already contain many of the best-in-class software, that is, the most featureful, most performant, and most accessible tool for a particular task. Examples are DifferentialEquations.jl, DynamicalSystems.jl, Agents.jl, Turing.jl, NeuralPDEs.jl, Distributions.jl, Makie.jl, JuMP.jl, ComplexityMeasures.jl, and many more. Besides, even if there is some tool that you need that is not available in Julia, that is not a problem due to its composability with other languages (see below).

We should point out how surprising this is. Julia is a more recent programming language, has much less users, and has received much less large scale funding, when compared to Python, the currently most popular programming language for scientists. The fact that Julia matches Python's library ecosystem, and in some fields it surpasses it, is a testament to 1) how easy it is to develop, or contribute to, software in Julia, and 2) the amount of software engineering talent that Julia has attracted.

Extensibility/Composability

This aspect, while often ignored in programming language discussions, is crucial in academia. It can make the difference of a scientific work being a cryptic script forgotten in a hard drive for the rest of time, or instead being a full package (or part of another already established package) that other scientists can then re-use and continue from to accelerate their own research. Additionally, good extensibility and composability typically also means code re-use, which itself means good maintainability (that it is easy to maintain your code base in the long term).

Julia is one of the best tools on the market for extensibility and composability in scientific code. When it comes to composability with software from other languages, Julia allows natively calling C/FORTRAN code. Packages such as PythonCall.jl or RCall.jl allows directly calling code from the respective languages (in fact, PythonCall.jl allows the typical object oriented syntax to be used in Julia).

But the real strength of Julia lies within the composability and extensibility that Julia packages have with each other. Julia has brought in an unprecedented amount of code re-use, enabling packages to easily communicate with each other and extend each other, all without the boilerplate code and name spaces issues one encounters in languages like Python. This is part of the reason that Julia has seen such an explosive growth of available packages in practically every scientific field. And the basis of all this composability is Julia's multiple dispatch system.

(proof of claims: https://www.youtube.com/watch?v=kc9HwsxE1OY and https://www.youtube.com/watch?v=2MBD10lqWp8 and https://github.com/Datseris/Zero2Hero-JuliaWorkshop for a hands-on demonstration on composability)

Accessibility/Shareability/Reproducibility

Getting started in Julia is simple! Installing the core language is a simple download-and-click-install for every operating system. Julia features a version multiplexer called juliaup, that makes managing multiple Julia versions a trivial one-liner. Installing packages for the language is equally simple. Julia features an powerful package manager (which is itself a package of the language). Since most packages of Julia are written in pure Julia, installing them is also a simple one line command. When it comes to binary dependencies, Julia has a pioneering pre-built binary system: binaries are built, compiled, and stored, for any system and platform combinations, and are automatically installed with a package that has binary dependencies. No more spending weeks to just install your software!

Sharing Julia projects is equally simple due to the strong package manager. A Julia environment (which are two text files) carries a list of of all the packages used, and their dependencies, all the way down to the exact git-commit of the used package. One can share this environment and the associated scripts and the receiving user can instantiate them, reproducing the same environment. This runs the same code with the same versions hence yielding the same output.

Detailed advantages of Julia

This section goes through specific advantages of Julia in a bullet-point list. It also gives references and provides further reading resources.

  1. It solves the two language problem: it is a dynamic and interactive language that allows real time scientific exploration typically done in interpreted languages like Python, but still offers the performance of static low level languages such as C. Julia works by compiling machine level code and hence all basic programming concepts such as iteration, broadcasting, function-as-arguments, are fast by themselves. Hence, you would never have to "re-write" a code in Julia in another language to make it faster! This way you spent less time writing (or re-writing code) and more time progressing your work. It also means that you don't have to be proficient in two programming languages to get involved with a library development.

  2. It solves the two cultures problem: Julia also solves the two cultures problem in scientific computing, because it makes it easy for the "normal" scientist to also be a "software developer". This partly due to point 1, that Julia allows you to do everything in one language and in one slim codebase. However, it is also because it is so damn easy in Julia to take your existing code base and make it into a formal publishable package due to strong package manager support.

  3. It occupies the "sweet spot" of high performance and simple code in a global comparison between all programming languages:

    speed vs codesize language comparison

    This figure is created by developers of the Chapel Language, which does not particularly target academic usage. The image comes courtesy of a public post: https://twitter.com/ChapelLanguage/status/1623389242822111232 .

  4. Its syntax is intuitive and as close to math as possible: The combination of high level syntax, Unicode, and simple to reason for code makes the code faster to write and read. Additionally, the modern Julia syntax parser eliminates the use of many "decorators" such as ending lines with ; or requiring indentation to denote code blocks, as it understands automatically when commands start and end.

  5. Multiple dispatch: is the core programming paradigm of Julia and is used with functional programming. In our opinion is the most suitable paradigm to implement scientific thought in code because it parallelizes scientific thinking: a "process" (function) does not belong to any particular data structure. Multiple dispatch and the exponential expressive power it brings are showcased well in this talk by Stefan Karpinski.

  6. Unprecedented code re-use and inter-package communication. This is a direct consequence of multiple dispatch and it is a unique property of Julia that has not been seen in other programming languages. In short, in Julia packages can use and extend other packages very easily (most of the times for free!), without boiler-plate or glue code. Due to this, most packages re-use existing code and have common interfaces with other packages. For example in this talk Chris Rackauckas highlights how in Julia developing for machine learning is the same as developing as for any other standard situation, which is a big reason why the scientific machine learning open community in Julia is matches up to Python equivalents like PyTorch and well exceeds them when it comes to differential equations side. To bring the point home even further, see this presentation by Kristoffer Carlsson and Fredrik Bagge Carlson for an insane showcase of the power of the multiple dispatch system, showing how a user gets for free trigonometric functions for real numbers, matrices, error propagation, symbolic dynamics, and automatic differentiation, all with the base sin function and only 10 lines of code, and all while solving a differential equation that includes all these possible types in the sin function.

  7. Julia is written in Julia (from a practical academic's point of view). This is part of the reason why Julia solves the two language problem, and it comes with even more advantages.

    • A typical user code isn't really different from Julia's very own base code, all the way down to basic arithmetic. This means, that understanding source code of other's packages, or even Julia's code itself, is straightforward. Hence, it is also straightforward to improve an existing code base via a code contribution.
    • The above leads to the natural consequence that a typical Julia user is already 90% of the way to being a Julia package developer. Julia's strong package tooling suite further makes this easier, which explains the explosive growth of Julia despite the lack of funding (versus e.g., Python).
    • Most basic Julia types are used almost everywhere, and even if they aren't, due to multiple dispatch a front-end user wouldn't care. To give an example: a Python user would have to use e.g., array types from PyTorch to implement performant advanced algorithms, especially for large datasets. However, if performant version of a function/operation a user needed, like e.g., the gamma function or some algorithm that operates on arrays, was not implemented for this "special" array type, that user is doomed. They will most likely not understand how a package like PyTorch implements numerical schemes, to add their version of what they need. Instead, they will have to convert to "normal" python array, at a price of a slowdown in performance, and then going back again to the "fast" array versions. In Julia such things don't happen, because the "fast" array version is the "standard" array version, and even if not, all array types are anyways part of the same abstract interface due to multiple dispatch.
    • As a consequence, Python users are "forced" to find existing implementations of algorithms/functionalities in these Python packages like PyTorch/NumPy, and are "discouraged" from writing their own versions (writing a Runge-Kutta solver in Python was one of the biggest mistakes I've made!). Julia users instead could write their own low-level code, which improves their algorithmic/programming skills, gives them better understanding of how the algorithm works, and gives them more flexibility over it as well.
  8. Julia's package ecosystem is already top-of-the-class in some scientific disciplines. Even though Julia is very new, and with a relatively small user base (StackOverflow results show Python usage at about 50%, Julia at about 2%), in many disciplines Julia's ecosystem is at least as good as Python's, while in some others it is even better. I can only speak from experience, and from my perspective these ecosystems are about nonlinear dynamics & complex systems, differential equations & scientific machine learning, machine learning and auto differentiation, statistics (especially Distributions.jl and OnlineStats.jl), interactive plotting and even a scientific project assistant software.

  9. Developer communities around seemingly every area of science. Going beyond the above mentioned developer communities, Julia also appears to have a developer community around seemingly every area of science. For example

  10. Interoperability with other languages: C is directly and natively callable from Julia. Python is callable from Julia with the same syntax as normal object-oriented Python code via PythonCall.jl. This means that you can really use any Python package in Julia, most of the time without even changing the syntax of the Python code. R, FORTRAN, etc., are callable similarly simply.

  11. Exceptionally strong integrated package manager: Julia's package manager is just another package. It is flexible, strong, leading to less ambiguities versus other languages. On top of it, a strong binary shipping system is built. This all means that everything runs everywhere: no makefile nonsense, no spending weeks figuring out how to install things, no worries whether your program will be able to run on Windows. Everything is a 1-click install.

  12. Welcoming and responsive community: My experience using Julia for 6+ years is that it has one of the most welcoming and responsive communities I have encountered. New questions asked on the official Julia Discourse forum or Slack channels consistently get answers within minutes. This means that there is no real reason to worry that your questions won't get answers due to the relatively smaller community of Julia versus e.g., Python.

  13. Many large-scale projects and organizations have already adopted Julia: For example, the USA federal government uses Julia for its cost-benefit analysis of climate change (https://www.mimiframework.org/). A blend of MIT, CalTech, and JPL scientists use Julia to create a brand new Earth System Model (https://clima.caltech.edu/). NASA switched from MATLAB to Julia for its Launch Service simulations, resulting in a 15,000x performance acceleration and and an overall more flexible code base that is also easier to learn (https://www.youtube.com/watch?v=tQpqsmwlfY0). Moderna (pharmaceutical company) uses PumasAI, a software based on Julia, DifferentialEquations.jl and more Julia packages.

  14. Easy installs and pre-built binary dependencies: In Julia the days where you would have to spend weeks, while being supervised by the senior postdoc of the group, just to install a software are long gone. Since most Julia packages are written in Julia, installing anything is a trivial 1 line of code. But even in cases of binary dependencies, Julia has made things rather easy! It offers persistent binary artifact storage via Yggdrasil by building binary system images for any platform combinations: BinaryBuilder.jl (and see the official blog post for more details).

But why do you try so hard to convince people?

This repository is not funded by Julia-related companies or funded by anything for that matter! We try to convince people because we genuinely believe that Julia can accelerate scientific progress and increase openness in code in academia. It's a better future for everyone :)

(Full disclosure: authors of this post also develop packages for the Julia programming language. More Julia users means more potential users for these packages, which means more potential contributors for these packages. However, the potential increase in contributors is such a low probability event for a given "conversion" to Julia, that it cannot form a basis for trying to convince people in the first place.)

About

Why Julia - A Manifesto.

License:Creative Commons Zero v1.0 Universal