Contact: Nicolas Fauchereau
--
- The Anaconda python distribution
- Installation of additional libraries
- Running the IPython notebooks
- Links to the static version of the notebooks
For this tutorial, I recommend installing the Anaconda Python distribution. It is a completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing. It includes the python interpreter itself, the python standard library as well as a set of packages exposing data structures and methods for data manipulation and scientific computing and visualization. In particular it provides Numpy, Scipy, Pandas, Matplotlib, scikit-learn and statmodels, i.e. all the main packages we will be using during the tutorial. The full list of packages is available at:
http://docs.continuum.io/anaconda/pkgs.html
The Anaconda python distribution must be downloaded from:
For your platform.
Once you have installed Anaconda, you can update to the latest compatible versions of all the pre-installed packages by running:
$ conda update conda
Then
$ conda update anaconda
In a terminal.
You also need to install pip to install packages from the Python Package Index.
$ conda install pip
While we might not have the time to cover them in depth during the tutorial, I would recommend that you have a look at a few extra libraries.
Basemap is a graphic library for plotting (static, publication quality) geographical maps (see http://matplotlib.org/basemap/). Basemap is available directly in Anaconda using the conda package manager, install with:
$ conda install basemap
Bokeh is a new interactive plotting library developed by the team behind anaconda: it is thus installable with conda (if not already installed):
$ conda install bokeh
seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. You should be able to install it with pip
:
$ pip install seaborn
or (if you want the bleeding edge version):
$ pip install git+git://github.com/mwaskom/seaborn.git#egg=seaborn
You may have to install 2 additional libraries for seaborn, husl
and moss
. If you experience failures during seaborn
installation or when trying to import it, tr:
$ pip install husl
$ pip install moss
mplD3 aims at bringing matplotlib to the browser. It has been developed by Jake VanDerPlas. It is also installable by pip
:
$ pip install mpld3
bearcart has been developed by Rob Story and provides an interface to the rickshaw JavaScript library. It is also installable via pip
:
$ pip install bearcart
folium has been also been developed by Rob Story to provide an interface to the leaflet.js JavaScript mapping library. Install with:
$ pip install folium
The material of the tutorial is in the form of IPython notebooks. In a nutshell an IPython notebook is a web-based (i.e. running in the browser) interactive computational environment where you can combine Python code execution, text, mathematics, plots and rich media into a single document, which makes it an ideal medium for teaching and exploring.
After uncompressing the archive of the repo (or after cloning it with git
), navigate to the corresponding directory (containing the *.ipynb
files) and type:
$ ipython notebook
That should bring up the IPython notebook dashboard, you should be ready to go !
Below are links to the static, HTML-rendered version of the tutorial notebooks (thanks to http://nbviewer.ipython.org/):
- Introduction, resources and acknowledgments
- IPython notebook intro
- Numpy
- Scipy
- Matplotlib
- Pandas
- Statistical modelling and Machine Learning
The notebooks on the IPython notebook widgets and the one on creating interactive plots in the browser need to be run.