A template for ML projects striking a balance between reproducibility and rapid iteration.
-
Python 3.5
-
Git
-
Tensorflow 1.5
Important: When commiting, remember to be in the virtual environment, for hooks to work.
Note: Colab still runs on different versions of everything, but this should be a good approximation (2018/02/27).
Make sure there is no venv/
directory in your repository. If there is, remove it.
Run the following commands:
./setup/create_venv.sh
source venv/bin/activate
Important: For all commands here, we assume you are source into
the virtual environment: source venv/bin/activate
This template is to be used for machine learning research projects.
As such, we assume most development will happen in a Jupyter notebook
on Colab. These should be periodically
downloaded and committed into the notebooks/
folder.
To upload them to Colab automatically, follow the instructions in
resources/google_drive_upload/README.adoc
and then run
./resources/google_drive_upload/refresh.sh
WARNING: This deletes the remote (GDrive) versions of the notebooks from GDrive!
TODO: Create a notebook "download and commit" script in addition to this. (#17)
Once the notebooks are ready and stable enough, they should be converted into a Python
script, modularized, and moved into the models/
, features/
, resources/
, and
flags/
folder, respectively.
The following sections describe the ways you can work with this template.
Work on Colab, save notebooks in the notebooks/
directory.
These can also be worked on locally using Jupyter.
In the project root directory, you can run either:
-
jupyter notebook
, -
or
jupyter lab
.
To run Tensorboard locally alongside these, just run it with --logdir data_out/
.
To run Tensorboard in the notebook server directly, go to the specified log directory, and select New > Tensorboard. A pop-up window will open (make sure to enable those in your browser).
Note: Only do this once, in the data_out/
folder.
If you are on Colab, add the following cell to your notebook, ideally in a "section":
# noqa
import os
if COLAB:
%cd /content
ROOT_DIR = '/content'
REPO_DIR = os.path.join(ROOT_DIR, 'ml_project_template')
LOG_DIR = os.path.join(REPO_DIR, 'data_out')
if not os.path.isdir(REPO_DIR):
!git clone https://github.com/oskopek/ml_project_template.git
if not os.path.isdir(LOG_DIR):
os.makedirs(LOG_DIR)
%cd 'ml_project_template'
%ls
import resources.colab_utils.tboard as tboard
# will install `ngrok`, if necessary
# will create `log_dir` if path does not exist
tboard.launch_tensorboard(bin_dir=REPO_DIR, log_dir=LOG_DIR)
else:
wd = %pwd
print('Current directory:', wd)
if wd.endswith('notebooks'):
%cd ..
A URL will be in the output, use that to connect to Tensorboard in your browser. The URL should only work from your machine (localhost).
After you have build a Docker image using:
make build-cpu
or make build-gpu
(or pulling one from the remote Docker hub),
you can use the Docker wrapper:
-
Jupyter on Docker:
./run_docker.sh PASSWORD jupyter
.-
To execute a specific notebook and print its output to stdout, use:
./run_docker.sh PASSWORD notebook NOTEBOOK_FILE
-
Do note that
NOTEBOOK_FILE
is a path relative to the repository root and must also be present in the image!
-
-
For both of these commands,
PASSWORD
is the password you want to set for the Jupyter web interface. -
You can access it at http://localhost:8888/.
-
Same applies to
./run_docker.sh PASSWORD lab
.
-
-
Python models on Docker:
./run_docker.sh model MODEL_MODULE FLAG_FILE
-
For example:
./run_docker.sh model models.mnist_gan_graph flags/gan.json
-
Do note that
MODEL_MODULE
andFLAG_FILE
are paths relative to the repository root and must also be present in the image! -
This will automatically run Tensorboard on http://localhost:6006/
-
Without the wrapper:
docker run IMG jupyter # runs Jupyter
docker run IMG lab # runs JupyterLab
docker run IMG notebook notebooks/test.ipynb # Runs the notebook
docker run IMG model models.mnist_gan_graph flags/gan.json # Runs the MNIST GAN graph model with flags from the specified file
You can also use the above commands without using Docker, by invoking the run script directly:
./docker/run.sh jupyter # runs Jupyter
./docker/run.sh lab # runs JupyterLab
./docker/run.sh notebook notebooks/test.ipynb # Runs the notebook
./docker/run.sh model models.mnist_gan_graph flags/gan.json # Runs the MNIST GAN graph model with flags from the specified file
-
data_in/
— input data and associated scripts/configs -
data_out/
— output data and logs + associated scripts/configs -
docker/
— setup and configs for running stuff inside and outside of Docker -
features/
— feature preprocessing and normalization Python code + configs -
flags/
— command line flags, model parameters, etc. -
models/
— scripts defining the models + hyperparameters -
notebooks/
— data exploration and other rapid development notebooks-
Models from here should eventually be promoted into
models/
-
-
resources/
— Python utilities -
setup/
— environment setup and verification scripts in Python/Bash -
venv/
— the (local) Python virtual environment