Caliban

Caliban is a tool for developing research workflow and notebooks in an isolated Docker environment and submitting those isolated environments to Google Compute Cloud.

Caliban makes it astonishingly easy to develop and execute code locally, and then ship the exact same code up to a Cloud environment for execution on Big Iron machines.

To get started:

see the Installation section below, then
visit the short tutorial at the "Getting Started" section.
Next steps for exploration are outlined at "What Next?", and
the Overview provides some more flavor on the various subcommands that Caliban provides.

Full documentation for Caliban lives at Read The Docs.

“Be not afeard; the isle is full of noises,
Sounds, and sweet airs, that give delight and hurt not.
Sometimes a thousand twangling instruments
Will hum about mine ears; and sometime voices,
That, if I then had waked after long sleep,
Will make me sleep again: and then, in dreaming,
The clouds methought would open, and show riches
Ready to drop upon me; that, when I waked,
I cried to dream again.”

-- Shakespeare, The Tempest

Installation and Prerequisites

Caliban lives on PyPI, so installation is as easy as:

pip install -U caliban

If you want to make Caliban available globally, we recommend installing via pipx. Get pipx installed, and then run:

pipx install caliban

To run any commands, you'll need to install Docker, and make sure your Python version is >= 3.7.0:

$ python --version
Python 3.7.7

On a Mac, install python and Docker, and check if your installation is working by navigating to some empty folder and running:

$ caliban --help
usage: caliban [-h] [--helpfull] [--version]
               {shell,notebook,build,run,cloud,cluster,status,stop,resubmit}
               ...

Our more detailed Getting Started documentation has instructions for Linux boxes, nvidia-docker setup and Google Cloud credential configuration. Armed with these tools you'll be able to run scripts locally using a your GPU or submit caliban-dockerized jobs to Cloud.

Now that you have Caliban installed:

see the Getting Started section below, or
read the Overview for a discussion of Caliban's subcommands.

Overview

Caliban provides five subcommands that you run inside some project directory on your machine:

caliban shell generates a Docker image containing any dependencies you've declared in a requirements.txt and/or setup.py in the directory and opens an interactive shell in that directory. The caliban shell environment is ~identical to the environment that will be available to your code when you submit it to AI Platform; the difference is that your current directory is live-mounted into the container, so you can develop interactively.
caliban notebook starts a Jupyter notebook or lab instance inside of a docker image containing your dependencies; the guarantee about an environment identical to AI Platform applies here as well.
caliban run packages your directory's code into the Docker image and executes it locally using docker run. If you have a GPU, the instance will attach to it by default - no need to install the CUDA toolkit. The docker environment takes care of all that. This environment is truly identical to the AI Platform environment. The docker image that runs locally is the same image that will run in AI Platform.
caliban cloud allows you to submit jobs to AI Platform that will run inside the same docker image you used with caliban run. You can submit hundreds of jobs at once. Any machine type, GPU count, and GPU type combination you specify will be validated client side, so you'll see an immediate error with suggestions, rather than having to debug by submitting jobs over and over.
caliban build builds the docker image used in caliban cloud and caliban run without actually running the container or submitting any code.
caliban cluster creates GKE clusters and submits jobs to GKE clusters.

Getting Started

This first example will show you how to use Caliban to run a short script inside of a Caliban-generated Docker container, then submit that script to AI Platform.

Make a new project folder and create a small script:

mkdir project && cd project
echo "import platform; print(f\"Hello, World, from a {platform.system()} machine.\")" > hello.py

Run the script with your local Python executable:

$ python hello.py
Hello, World, from a Darwin machine.

Use Caliban to run the same script inside a Docker container:

caliban run --nogpu hello.py
...elided...

0611 15:12:44.371632 4389141952 docker.py:781] Running command: docker run --ipc host 58a1a3bf6145
Hello, World, from a Linux machine.
I0611 15:12:45.000511 4389141952 docker.py:738] Job 1 succeeded!

Change a single word to submit the same script to Google's AI Platform:

caliban cloud --nogpu hello.py

(For this last step to work, you'll need to set up a Google Cloud account by following these instructions).

Slightly Expanded

This next example shows you how to do interactive development using caliban shell. Once you get your script working, you can use caliban run and caliban cloud on the script, just like above.

Run the following command in the project folder you created earlier:

caliban shell --nogpu

If this is your first command, you'll see quite a bit of activity as Caliban downloads its base image to your machine and builds your first container. After this passes, you should see Caliban's terminal:

I0611 12:33:17.551121 4500135360 docker.py:911] Running command: docker run --ipc host -w /usr/app -u 735994:89939 -v /Users/totoro/code/example:/usr/app -it --entrypoint /bin/bash -v /Users/totoro:/home/totoro ab8a7d7db868
   _________    __    ________  ___    _   __  __  __
  / ____/   |  / /   /  _/ __ )/   |  / | / /  \ \ \ \
 / /   / /| | / /    / // __  / /| | /  |/ /    \ \ \ \
/ /___/ ___ |/ /____/ // /_/ / ___ |/ /|  /     / / / /
\____/_/  |_/_____/___/_____/_/  |_/_/ |_/     /_/ /_/

You are running caliban shell as user with ID 735994 and group 89939,
which should map to the ID and group for your user on the Docker host. Great!

[totoro@6a9b28990757 /usr/app]$

You're now living in an isolated Docker container, running Linux, with a clean virtual environment available. Your home directory and the folder where you ran the command are both live-mounted into the container, so any changes you make to either of those directories will be reflected immediately.

Type C-d to exit the container.

Create a new file called requirements.txt in the folder and add the tensorflow dependency, then run caliban shell again:

echo tensorflow >> requirements.txt
caliban shell --nogpu

You should see more activity as caliban builds a new container with tensorflow installed. This time, inside your container, run python and check that tensorflow is installed:

$ python
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.2.0'
python

Any code you write in this folder will be accessible in the shell. If it works at caliban shell, you can be almost certain that your code will execute in a Cloud environment, with potentially many GPUs attached and much larger machines available.

and you didn't have to write a single Dockerfile!

What next?

Next, you might want to explore:

triggering hundreds of jobs from an experiment config
Submitting jobs to Google Cloud with caliban cloud
Working in a Jupyter notebook in the same, isolated environment where your production code will run via caliban notebook

There is a lot to explore. Head over to Caliban's documentation site and check out the links on the sidebar.

If you find anything confusing, please feel free to create an issue on our Github Issues page, and we'll get you sorted out.

Disclaimer

This is a research project, not an official Google product. Expect bugs and sharp edges. Please help by trying out Caliban, reporting bugs, and letting us know what you think!

Contributing

Please refer to our Contributor's Guide for information on how to get started contributing to Caliban.

Citing Caliban

If Caliban helps you in your research, pleae consider citing the repository:

@software{caliban2020github,
  author = {Vinay Ramasesh and Sam Ritchie and Ambrose Slone},
  title = {{Caliban}: Docker-based job manager for reproducible workflows},
  url = {http://github.com/google/caliban},
  version = {0.1.0},
  year = {2020},
}

In the above bibtex entry, names are in alphabetical order, the version number is intended to be that of the latest tag on github, and the year corresponds to the project's open-source release.

License

Licensed under the Apache License, Version 2.0.

johnynek / caliban