Julia is a relative newcomer in the scene of scientific computing, a field dominated currently by the likes of R and python. It’s a curious mix of high performance (on par with C), yet with a decided user-friendliness and a syntax strongly reminiscent of matlab. Despite it’s youth (1.0 is still somewhere on the horizon) it comes quite batteries-included, and there are a surprising number of libraries already available. Therefore I’m going to go over some first steps in Julia, aimed at people with limited programming experience.
The language is freely available over at http://julialang.org/downloads/, but is making its way into some Linux distribution’s repos so check here for some platform-specific instructions.
The REPL (Read-Eval-Print-Loop) or interactive session is started via the
julia command in a Linux terminal, or by the launching the installed Julia application on Windows. Provided all goes well you should be greeted with something that resembling the following:
_ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.4.6 (2016-06-19 17:16 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-pc-linux-gnu julia>
This should be familiar to those with some python experience, typing simple expressions will evaluate in a similar manner to python’s IDLE. However it’s also where Julia’s in-built package manager will function, such as installing external libraries with the
julia> print("Hello World") Hello World julia> Pkg.add("DataFrames")
Scripts can be executed via the session:
Or straight from the command line if the Julia binary is present on your path:
$ julia path/to/script.jl
I’m not going to run over all the syntax, as there are plenty of fantastic documents already out there, such as the official introduction. Besides one of the benefits of Julia is it’s legibility, so I’ll go over some simple yet useful examples. Here’s a simple script that will create the following plot:
using DataFrames using Gadfly download("http://apps.who.int/gho/athena/data/xmart.csv?target=GHO/WHS4_117&profile=crosstable&filter=COUNTRY:*&x-sideaxis=COUNTRY&x-topaxis=GHO;YEAR&x-collapse=true", "tmp.csv") df = readtable("tmp.csv"); df = df[df[:Hepatitis_B_HepB3_immunization_coverage_among_1_year_olds_2014] .< 60, :]; p = plot(df, x=:Country, y=:Hepatitis_B_HepB3_immunization_coverage_among_1_year_olds_2014, Guide.ylabel("Coverage (%)"), Geom.bar) draw(SVG("myplot.svg", 25cm, 15cm), p) rm("tmp.csv")
Now on to the explanation. The top two lines are declaring that we are using two packages that other people have written and made available. The
using statement is similar to python’s
import or R’s
library() commands for our purposes (N.B Julia does also use the
import keyword, but there is a distinction between
import). DataFrames unsurprisingly provides the DataFrame type, again very similar to those found in R and python’s Pandas. Gadfly is a plotting library based on
ggplot2 for R, and allows for the easy creation of good-looking plots. Both of these packages will have needed to have been installed via the
download function simply accesses the file at the given URL and saves it to the provided location,
tmp.csv. In this case the URL is something I looked up by exploring the WHO Data Repository bank of Hepatitis Immunization Data.
df = readtable("tmp.csv"); df = df[df[:Hepatitis_B_HepB3_immunization_coverage_among_1_year_olds_2014] .< 60, :];
These lines start out by reading the downloaded data into the dataframe
readtable. This function intelligently parses the CSV file, including identifying the header row and applies names to the columns of the data, stripping out awkward characters and replacing spaces with underscores. The
:ColName way of indexing is following the Julia conventions, and is a useful pattern across many tools in Julia.
The second line creates a subset of the data using logical indexing and the element-wise operator (denoted by the
.) to select the countries where immunization is less that 60%. The colon is there to indicate selection of all fields of rows where the previous condition is true. Anyone with previous matlab experience may be experiencing flashbacks/deja vu at this point, and for good reason as Julia shares many syntax and style elements with matlab, including the controversial 1-based indexing.
draw commands follow Gadlfy’s API, so I’ll just lightly skim the specifics. As before the column names can be addressed using the colon notation, and the final parameters specify some formatting for the plot, changing the y-axis label, and specifying that it is a bar chart. After specifying the image format and size, the plot is then saved which can be viewed in any compatible viewer. The final
rm call is simply cleaning up after ourselves.
I’ve only been toying with Julia so far, so I don’t claim to know the ins and outs of the language, but in general I’ve found it enjoyable to experiment with and very reminiscent of python and matlab. For those of you struggling to give up established workflows in python, well you don’t have to. Julia makes it ludicrously simple to call python libraries and functions through the PyCall package. Furthermore there is already a solid implementation for the Jupyter notebooks.