jsacco1 / bioconductor_docker

Docker Containers for Bioconductor - NEW!

Home Page:https://bioconductor.org/help/docker/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker containers for Bioconductor

Docker packages software into self-contained environments, called containers, that include necessary dependencies to run. Containers can run on any operating system including Windows and Mac (using modern Linux kernels) via the Docker engine.

Containers can also be deployed in the cloud using Amazon Elastic Container Service or Google Kubernetes Engine.

Quick start

  1. Install Docker

  2. Run container with Bioconductor and RStudio

     docker run \
     	-e PASSWORD=bioc \
     	-p 8787:8787 \
     	bioconductor/bioconductor_docker:devel
    

    This command will run the docker container bioconductor/bioconductor_docker:devel on your local machine.

    RStudio will be available on your web browser at https://localhost:8787. The USER is fixed to always being rstudio. The password in the above command is given as bioc but it can be set to anything. 8787 is the port being mapped between the docker container and your host machine. NOTE: password cannot be rstudio.

    The user is logged into the rstudio user by default.

Why use Containers

With Bioconductor containers, we hope to enhance

  • Reproducibility: If you run some code in a container today, you can run it again in the same container (with the same tag) years later and know that nothing in the container has changed. You should always take note of the tag you used if you think you might want to reproduce some work later.

  • Ease of use: With one command, you can be running the latest release or devel Bioconductor. No need to worry about whether packages and system dependencies are installed.

  • Convenience: Easily start a fresh R session with no packages installed for testing. Quickly run an analysis with package dependencies not typical of your workflow. Containers make this easy.

Our aim is to provide up-to-date containers for the current release and devel versions of Bioconductor, and some older versions. Bioconductor’s Docker images are stored in Docker Hub; the source Dockerfile(s) are on Github.

Our release images and devel images are based on the Rocker Project - rocker/rstudio image and built when a Bioconductor release occurs.

Goals for new container architecture

A few of our key goals to migrate to a new set of Docker containers are,

  • to keep the image size being shipped by the Bioconductor team at a manageable size.

  • easy to extend, so developers can just use a single image to inherit and build their docker image.

  • easy to maintain, by streamlining the docker inheritance chain.

  • Adopt a "best practices" outline so that new community contributed docker images get reviewed and follow standards.

  • Adopt a deprecation policy and life cycle for images similar to Bioconductor packages.

  • Replicate the Linux build machines (malbec2) on the bioconductor/bioconductor_docker:devel image as closely as possible. While this is not fully possible just yet, this image can be used by maintainers who wish to reproduce errors seen on the Bioconductor Linux build machine and as a helpful debugging tool.

Current Containers

For each supported version of Bioconductor, we provide

  • bioconductor/bioconductor_docker:RELEASE_X_Y

  • bioconductor/bioconductor_docker:devel

Bioconductor's Docker images are stored in Docker Hub; the source Dockerfile(s) are in Github.

Deprecation Notice

For previous users of docker containers for Bioconductor, please note that we are deprecating the following images. These images were maintained by Bioconductor Core, and also the community.

Legacy Containers

These images are NO LONGER MAINTAINED and updated. They will however be available to use should a user choose. They are not supported anymore by the Bioconductor Core team.

Bioconductor Core Team: bioc-issue-bot@bioconductor.org

Steffen Neumann: sneumann@ipb-halle.de, Maintained as part of the "PhenoMeNal, funded by Horizon2020 grant 654241"

Laurent Gatto: lg390@cam.ac.uk

RGLab: wjiang2@fredhutch.org

First iteration containers

  • bioconductor/devel_base
  • bioconductor/devel_core
  • bioconductor/devel_flow
  • bioconductor/devel_microarray
  • bioconductor/devel_proteomics
  • bioconductor/devel_sequencing
  • bioconductor/devel_metabolomics
  • bioconductor/release_base
  • bioconductor/release_core
  • bioconductor/release_flow
  • bioconductor/release_microarray
  • bioconductor/release_proteomics
  • bioconductor/release_sequencing
  • bioconductor/release_metabolomics

Reason for deprecation

The new Bioconductor Docker image bioconductor/bioconductor_docker makes it possible to easily install any package the user chooses since all the system dependencies are built in to this new image. The previous images did not have all the system dependencies built in to the image. The new installation of packages can be done with,

BiocManager::install(c("package_name", "package_name"))

Other reasons for deprecation:

  • the chain of inheritance of Docker images was too complex and hard to maintain.

  • Hard to extend because there were multiple flavors of images.

  • Naming convention was making things harder to use.

  • Images which were not maintained were not deprecated.

Reporting Issues

Please report issues with the new set of images on GitHub Issues or the Bioc-devel mailing list.

These issues can be questions about anything related to this piece of software such as, usage, extending Docker images, enhancements, and bug reports.

Using the containers

A well organized guide to popular docker commands can be found here. For convenience, below are some commands to get you started. The following examples use the bioconductor/bioconductor_docker:devel image.

Note: that you may need to prepend sudo to all docker commands. But try them without first.

Prerequisites: On Linux, you need Docker installed and on Mac or Windows you need Docker Toolbox installed and running.

List which docker machines are available locally
docker images
List running containers
docker ps
List all containers
docker ps -a
Resume a stopped container
docker start <CONTAINER ID>
Shell into a running container
docker exec -it <CONTAINER ID> /bin/bash
Shutdown container
docker stop <CONTAINER ID>
Delete container
docker rm <CONTAINER ID>
Delete image
docker rmi bioconductor/bioconductor_docker:devel

Running the container

The above commands can be helpful but the real basics of running a Bioconductor Docker involves pulling the public image and running the container.

Get a copy of public docker image
docker pull bioconductor/bioconductor_docker:devel
To run RStudio Server:
docker run -e PASSWORD=<password> \
	-p 8787:8787 \
	bioconductor/bioconductor_docker:devel

You can then open a web browser pointing to your docker host on port 8787. If you're on Linux and using default settings, the docker host is 127.0.0.1 (or localhost, so the full URL to RStudio would be http://localhost:8787). If you are on Mac or Windows and running Docker Toolbox, you can determine the docker host with the docker-machine ip default command.

In the above command, -e PASSWORD= is setting the RStudio password and is required by the RStudio Docker image. It can be whatever you like except it cannot be rstudio. Log in to RStudio with the username rstudio and whatever password was specified.

If you want to run RStudio as a user on your host machine, in order to read/write files in a host directory, please read this.

NOTE: If you forget to add the tag devel or RELEASE_X_Y while using the bioconductor/bioconductor_docker image, it will automatically use the latest tag which points to the latest RELEASE version of Bioconductor.

To run R from the command line:
docker run -it --user rstudio bioconductor/bioconductor_docker:devel R
To open a Bash shell on the container:
docker run -it --user rstudio bioconductor/bioconductor_docker:devel bash

Note: The docker run command is very powerful and versatile. For full documentation, type docker run --help or visit the help page.

[ Back to top ]

Mounting Additional Volume

One such option for docker run is -v to mount an additional volume to the docker image. This might be useful for say mounting a local R install directory for use on the docker. The path on the docker image that should be mapped to a local R library directory is /usr/local/lib/R/host-site-library.

The follow example would mount my locally installed packages to this docker directory. In turn, that path is automatically loaded in the R .libPaths on the docker image and all of my locally installed package would be available for use.

  • Running it interactively,

      docker run \
      	-v /home/my-devel-library:/usr/local/lib/R/host-site-library \
      	-it \
      	--user rstudio \
      	bioconductor/bioconductor_docker:devel
    

    without the --user rstudio option, the container is started and logged in as the root user.

    The -it flag gives you an interactive tty (shell/terminal) to the docker container.

  • Running it with RStudio interface

      docker run \
      	-v /home/my-devel-library:/usr/local/lib/R/host-site-library \
      	-e PASSWORD=password \
      	-p 8787:8787 \
      	bioconductor/bioconductor_docker:devel
    

[ Back to top ]

Modifying the images

There are two ways to modify these images:

  1. Making changes in a running container and then committing them using the docker commit command.

    docker commit

  2. Using a Dockerfile to declare the changes you want to make.

The second way is the recommended way. Both ways are documented here.

Example 1:

My goal is to add a python package 'tensorflow' and to install a Bioconductor package called 'scAlign' on top of the base docker image i.e bioconductor/bioconductor_docker:devel.

As a first step, my Dockerfile should inherit from the bioconductor/bioconductor_docker:devel image, and build from there. Since all docker images are Linux environments, and this container is specifically 'Debian', I need some knowledge on how to install libraries on Linux machines.

In your new Dockerfile, you can have the following commands

# Docker inheritance
FROM bioconductor/bioconductor_docker:devel

# Update apt-get
RUN apt-get update \
	## Install the python package tensorflow
	&& pip install tensorflow		\
	## Remove packages in '/var/cache/' and 'var/lib'
	## to remove side-effects of apt-get update
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/*

# Install required Bioconductor package
RUN R -e 'BiocManager::install("scAlign")'

This Dockerfile can be built with the command, (note: you can name it however you want)

docker build -t bioconductor_docker_tensorflow:devel .

This will let you use the docker image with 'tensorflow' installed and also scAlign package.

docker run -p 8787:8787 -e PASSWORD=bioc bioconductor_docker_tensorflow:devel

Example 2:

My goal is to add all the required infrastructure to be able to compile vignettes and knit documents into pdf files. My Dockerfile will look like the following for this requirement,

# This docker image has LaTeX to build the vignettes
FROM bioconductor/bioconductor_docker:devel

# Update apt-get
RUN apt-get update \
	&& apt-get install -y --no-install-recommends apt-utils \
	&& apt-get install -y --no-install-recommends \
	texlive \
	texlive-latex-extra \
	texlive-fonts-extra \
	texlive-bibtex-extra \
	texlive-science \
	texi2html \
	texinfo \
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/*

## Install BiocStyle
RUN R -e 'BiocManager::install("BiocStyle")'

This Dockerfile can be built with the command,

docker build -t bioconductor_docker_latex:devel .

This will let you use the docker image as needed to build and compile vignettes for packages.

docker run -p 8787:8787 -e PASSWORD=bioc bioconductor_docker_latex:devel

[ Back to top ]

Singularity

The latest bioconductor/bioconductor_docker images are available on Singularity Hub as well. Singularity is a container runtime just like Docker, and Singularity Hub is the host registry for Singularity containers.

You can find the Singularity containers collection on this link https://singularity-hub.org/collections/3955.

These images are particularly useful on compute clusters where you don't need admin access. You need to have the module singularity installed. See https://singularity.lbl.gov/docs-installation (contact your IT department when in doubt).

If you have Singularity installed on your machine or cluster are:

Inspect available modules

module available

If Singularity is available,

module load singularity

Please check this link for specific usage instructions relevant to Singularity containers: https://singularity-hub.org/collections/3955/usage

Acknowledgements

Thanks to the rocker project for providing the R/RStudio Server containers upon which ours are based.

About

Docker Containers for Bioconductor - NEW!

https://bioconductor.org/help/docker/

License:Artistic License 2.0


Languages

Language:Dockerfile 70.8%Language:Shell 26.9%Language:R 2.3%