oskopek / ml_project_template

A template for ML projects striking a balance between reproducibility and rapid iteration.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ml_project_template

A template for ML projects striking a balance between reproducibility and rapid iteration.

Requirements and versions:

  • Python 3.5

  • Git

  • Tensorflow 1.5

Important: When commiting, remember to be in the virtual environment, for hooks to work.

Note: Colab still runs on different versions of everything, but this should be a good approximation (2018/02/27).

Setup

Make sure there is no venv/ directory in your repository. If there is, remove it. Run the following commands:

./setup/create_venv.sh
source venv/bin/activate

Important: For all commands here, we assume you are source into the virtual environment: source venv/bin/activate

Workflow

This template is to be used for machine learning research projects. As such, we assume most development will happen in a Jupyter notebook on Colab. These should be periodically downloaded and committed into the notebooks/ folder.

To upload them to Colab automatically, follow the instructions in

resources/google_drive_upload/README.adoc

and then run

./resources/google_drive_upload/refresh.sh

WARNING: This deletes the remote (GDrive) versions of the notebooks from GDrive!

TODO: Create a notebook "download and commit" script in addition to this. (#17)

Once the notebooks are ready and stable enough, they should be converted into a Python script, modularized, and moved into the models/, features/, resources/, and flags/ folder, respectively.

The following sections describe the ways you can work with this template.

Jupyter notebooks

Work on Colab, save notebooks in the notebooks/ directory. These can also be worked on locally using Jupyter. In the project root directory, you can run either:

  • jupyter notebook,

  • or jupyter lab.

To run Tensorboard locally alongside these, just run it with --logdir data_out/.

Tensorboard

To run Tensorboard in the notebook server directly, go to the specified log directory, and select New > Tensorboard. A pop-up window will open (make sure to enable those in your browser).

Note: Only do this once, in the data_out/ folder.

If you are on Colab, add the following cell to your notebook, ideally in a "section":

# noqa
import os
if COLAB:
    %cd /content
    ROOT_DIR = '/content'
    REPO_DIR = os.path.join(ROOT_DIR, 'ml_project_template')
    LOG_DIR = os.path.join(REPO_DIR, 'data_out')

    if not os.path.isdir(REPO_DIR):
        !git clone https://github.com/oskopek/ml_project_template.git
    if not os.path.isdir(LOG_DIR):
        os.makedirs(LOG_DIR)
    %cd 'ml_project_template'
    %ls

    import resources.colab_utils.tboard as tboard
    # will install `ngrok`, if necessary
    # will create `log_dir` if path does not exist
    tboard.launch_tensorboard(bin_dir=REPO_DIR, log_dir=LOG_DIR)
else:
    wd = %pwd
    print('Current directory:', wd)
    if wd.endswith('notebooks'):
        %cd ..

A URL will be in the output, use that to connect to Tensorboard in your browser. The URL should only work from your machine (localhost).

Docker / Custom runner

After you have build a Docker image using: make build-cpu or make build-gpu (or pulling one from the remote Docker hub), you can use the Docker wrapper:

  • Jupyter on Docker: ./run_docker.sh PASSWORD jupyter.

    • To execute a specific notebook and print its output to stdout, use: ./run_docker.sh PASSWORD notebook NOTEBOOK_FILE

      • Do note that NOTEBOOK_FILE is a path relative to the repository root and must also be present in the image!

    • For both of these commands, PASSWORD is the password you want to set for the Jupyter web interface.

    • You can access it at http://localhost:8888/.

    • Same applies to ./run_docker.sh PASSWORD lab.

  • Python models on Docker: ./run_docker.sh model MODEL_MODULE FLAG_FILE

    • For example: ./run_docker.sh model models.mnist_gan_graph flags/gan.json

    • Do note that MODEL_MODULE and FLAG_FILE are paths relative to the repository root and must also be present in the image!

    • This will automatically run Tensorboard on http://localhost:6006/

Without the wrapper:

docker run IMG jupyter # runs Jupyter
docker run IMG lab # runs JupyterLab
docker run IMG notebook notebooks/test.ipynb # Runs the notebook
docker run IMG model models.mnist_gan_graph flags/gan.json # Runs the MNIST GAN graph model with flags from the specified file

Using the run script without Docker

You can also use the above commands without using Docker, by invoking the run script directly:

./docker/run.sh jupyter # runs Jupyter
./docker/run.sh lab # runs JupyterLab
./docker/run.sh notebook notebooks/test.ipynb # Runs the notebook
./docker/run.sh model models.mnist_gan_graph flags/gan.json # Runs the MNIST GAN graph model with flags from the specified file

Directory structure

  • data_in/ — input data and associated scripts/configs

  • data_out/ — output data and logs + associated scripts/configs

  • docker/ — setup and configs for running stuff inside and outside of Docker

  • features/ — feature preprocessing and normalization Python code + configs

  • flags/ — command line flags, model parameters, etc.

  • models/ — scripts defining the models + hyperparameters

  • notebooks/ — data exploration and other rapid development notebooks

    • Models from here should eventually be promoted into models/

  • resources/ — Python utilities

  • setup/ — environment setup and verification scripts in Python/Bash

  • venv/ — the (local) Python virtual environment

Formatting

Run: ./setup/clean.sh. A Git hook will tell you if any files are misformatted before committing.

About

A template for ML projects striking a balance between reproducibility and rapid iteration.

License:MIT License


Languages

Language:Python 50.2%Language:Jupyter Notebook 44.5%Language:Shell 4.7%Language:Makefile 0.5%