hute37 / data-science-devcontainers

(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) Data Science Dev Containers for R, Python and Julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CUDA-enabled] Data Science Development Containers

[GPU accelerated] Multi-arch (linux/amd64, linux/arm64/v8) Data Science Dev Containers:

  • [CUDA] Julia base, pubtools
  • [CUDA] Python base, scipy
  • [CUDA] R base, tidyverse, verse, geospatial, qgisprocess

Dev Containers considered stable for

  • Julia versions ≥ 1.7.3
  • Python versions ≥ 3.10.5
  • R versions ≥ 4.2.0

Parent images

Extended to match the [CUDA-enabled] JupyterLab docker stacks, except that

  • GPU accelerated Dev Containers are based on the NVIDIA CUDA runtime flavoured image.
    • The JupyterLab docker stacks are based on the NVIDIA CUDA devel flavoured image.
  • Dev Containers' Oh My Zsh uses the devcontainers theme + default font.
    • The JupyterLab docker stacks' Oh My Zsh uses Powerlevel10k theme + MesloLGS NF font.
Features

  • JupyterLab: A web-based interactive development environment for Jupyter notebooks, code, and data.
  • Git: A distributed version-control system for tracking changes in source code.
  • Git LFS: A Git extension for versioning large files.
  • GRASS GIS: A free and open source Geographic Information System (GIS).
    ℹ️ R qgisprocess
  • Orfeo Toolbox: An open-source project for state-of-the-art remote sensing.
    ℹ️ R qgisprocess (amd64 only)
  • Julia1: A high-level, high-performance dynamic language for technical computing.
  • Pandoc: A universal markup converter.
  • Python: An interpreted, object-oriented, high-level programming language with dynamic semantics.
  • QGIS: A free, open source, cross platform (lin/win/mac) geographical information system (GIS).
    ℹ️ R qgisprocess
  • Quarto: A scientific and technical publishing system built on Pandoc.
    ℹ️ Julia pubtools, Python scipy, R verse+
  • R1: A language and environment for statistical computing and graphics.
  • SAGA GIS: A Geographic Information System (GIS) software with immense capabilities for geodata processing and analysis.
    ℹ️ R qgisprocess
  • TinyTeX: A lightweight, cross-platform, portable, and easy-to-maintain LaTeX distribution based on TeX Live.
    ℹ️ Julia pubtools, Python scipy, R verse+
  • Zsh: A shell designed for interactive use, although it is also a powerful scripting language.

👉 See the Version Matrices for detailed information:

Pre-installed extensions

Table of Contents

Prerequisites

Dev Containers require either Docker or Podman2 to be installed. CUDA-enabled versions require the following in addition:

  • NVIDIA GPU
  • NVIDIA Linux driver
  • NVIDIA Container Toolkit

ℹ️ The host running the GPU accelerated Dev Containers only requires the NVIDIA driver, the CUDA toolkit does not have to be installed.

Install

Codespaces require no installation, but do not currently offer machines with NVIDIA GPUs.

Docker

To install Docker, follow the instructions for your platform:

Podman

To install Podman, follow the instructions for your platform:

CUDA

To install the NVIDIA Container Toolkit, follow the instructions for your platform:

Usage

The 'Default project configuration'/'Main Dev Container' is meant to work on this repository.

Every other configuration is a custom Data Science Dev Container that behaves in a unique way:

  1. Default mount3:
    • source: empty directory
    • target: /home/vscode
    • type: volume
  2. Codespace only mount:
    • source: root of this repository
    • target: /workspaces/<repository-name>
    • type: ?
  3. Default path: /home/vscode
  4. Default user4: vscode
    • uid: 1000 (auto-assigned)
    • gid: 1000 (auto-assigned)
  5. Lifecycle scripts:
    • onCreateCommand: home directory setup
    • postStartCommand: Codespace only: Silently remove all unused images and all build cache
    • postAttachCommand: Codespace only: Check for Dev Container updates

To disable the postStartCommand or postAttachCommand, comment out line 8 in ~/.local/bin/dockerSystemPrune.sh or ~/.local/bin/checkForUpdates.sh.

Codespace

  1. Click the <> Code button, then click the Codespaces tab.
    A message is displayed at the bottom of the dialog telling you who will pay for the codespace.
  2. Create your codespace after configuring advanced options:
    • Configure advanced options
      To configure advanced options for your codespace, such as a different machine type or a particular devcontainer.json file:
      • At the top right of the Codespaces tab, select ... and click New with options....
      • On the options page for your codespace, choose your preferred options from the dropdown menus.
      • Click Create codespace.

Creating a codespace for a repository - GitHub Docs

To open your codespace in JupyterLab:

  1. Execute
    jupyter-lab \
      --ServerApp.allow_origin='*' \
      --ServerApp.cookie_options="{'Same Site': 'None', 'Secure': True}" \
      --ServerApp.tornado_settings="{'headers':{'Content-Security-Policy':\"frame-ancestors 'self' https://*.github.dev\", 'Access-Control-Allow-Headers': 'accept, content-type, authorization, x-xsrftoken, x-github-token'}}" \
      --notebook-dir=/home/vscode \
      --no-browser
  2. Ctrl+click on one of the URLs shown in the Terminal.

ℹ️ Opening your codespace in JupyterLab according to the GitHub Docs sets the default path to /workspaces/<repository-name> that you can not escape.

A 'Full Rebuild Container' resets the home directory!
ℹ️ This is never necessary unless you want exactly that.

Local/'Remote SSH'

Use the Dev Containers: Reopen in Container command from the Command Palette (F1, ⇧⌘P (Windows, Linux Ctrl+Shift+P))

To start JupyterLab:

  1. Execute
    jupyter-lab
  2. Ctrl+click on one of the URLs shown in the Terminal.

Similar project

What makes this project different:

  1. Multi-arch: linux/amd64, linux/arm64/v8
  2. Base image: Debian instead of Ubuntu
  3. IDE: JupyterLab next to VS Code
  4. Just Python – no Conda / Mamba

CUDA-enabled versions:

  1. Derived from nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
  2. TensortRT and TensorRT plugin libraries

Contributing

PRs accepted.

This project follows the Contributor Covenant Code of Conduct.

License

MIT © 2023 b-data GmbH

Footnotes

  1. Depending on which Dev Container is selected. 2

  2. See issue https://github.com/b-data/data-science-devcontainers/issues/1 about limitations.

  3. See issue https://github.com/b-data/data-science-devcontainers/issues/2 about changing the mount type.

  4. See issue https://github.com/b-data/data-science-devcontainers/issues/3 about running as root.

About

(GPU accelerated) Multi-arch (linux/amd64, linux/arm64/v8) Data Science Dev Containers for R, Python and Julia

License:MIT License


Languages

Language:Dockerfile 69.4%Language:Shell 25.4%Language:Python 2.8%Language:Julia 2.5%