trustbit / nix-data-science-vm

launch a secure and flexible work environment for data scientists in a cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trustbit Nix Data Science VM

This repository contains sample code for the Trustbit Data Science VM. It demonstrates how to quickly launch a secure and flexible work environment for data scientists in a cloud. This allows them to get started on a project in a familiar and powerful environment that can be tailored to their needs.

Why is this important?

  1. Data Scientists are more productive when they can use latest tools, code and frameworks. Productive data scientists are more satistfied with their jobs and hence more likely to stay around longer.
  2. Frequent bottleneck in daily productivity is about not having enough processing power to train or evaluate a model. Somethimes, not having access to dedicated hardware like NVidia GPUs, Google TPUs or Tenstorrent AI chips. By utilising cloud we give data scientists ability to scale out their work environment to handle larger workloads, if needed.

This setup uses Terraform and Nix to provide pre-configure web-driven development environment that integrates directly with Google Cloud. For small-scale deployments, it is comparable in speed to a classical VM setup based on Docker or bash scripts. It also introduces the flexibility of installing new dependencies (CUDA, Python, native binaries) with the ability to roll back changes.

For large-scale deployments (more than 5 workspaces), this setup is more secure and convenient to maintain. Why is this important?

While data scientists like to bring their own tools and dependencies into projects, this can quickly escalate into a maintenance nightmare for the operations people. Larger is the department, higher is the chance of time-consuming problems for both sides.

Check out this tweet by François Chollet, "Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'."

Been working with Python for 13 years and I still occasionally end up with a hopelessly borked environment where I have to actually nuke and reinstall the Python interpreter. And yes, I use virtualenv

— François Chollet (@fchollet) January 24, 2023

By using Nix we turn the data science VM into reproducible, declarative and reliable systems. Data scientists could safely apply their own configurations, while Ops could rebuild the entire fleet with the latest updates and security patches.

This approach allows data scientists to be creative with their work VM, while allowing operations to maintain a fleet of diverse VMs in a uniform and reliable manner.

Availability of a powerful and flexible work environment not only make people more productive, but also is an attractive perk to bring to hiring interviews.

This approach could be further extended by having reproducible, declarative and reliable project dependencies that are shared between the collaborators, operations and maintanenance. Current VM is designed to support that (via Nix flakes), but that is a subject for another sample Trustbit solution.

vscode

setup content:

Openvscode server, a service that you can run on a remote development machine, like Google Cloud Compute Instance. It allows you to securely connect to that remote machine from anywhere through a local VS Code client, without the requirement of SSH:

vscode

In this setup SSL and http connection serves nginx web server with letsencrypt SSL service provider:

vscode

  • terraform code that creates google cloud resources.
  • nix DSL code that setup compute server and services.

terraform creates the following resources:

VM instance

  • based on NixOS 22.11 (Raccoon)
  • fully compatible with google cloud
  • contain Direnv and Flake support out the box
  • contain nginx server
  • contain openvscode server
  • contain ssl configuration
  • contain dev tools like git, docker, etc

NixOS

NixOS is a Linux distribution built around the Nix package manager solving package and configuration management problems in its own unique way. When installing systems running Linux distributions by conventional means, it is common to do activities, such as installing the distribution itself, then installing additional custom packages, modifying configuration files and so on, which is often a tedious, time consuming and error prone process.

There are three ways to install an application on nixos:

  • As a system package from imperative package manager
  • In ephemeral shell environments
  • In reusable reproducible shell environments

This is an example of how to install go in ephemeral shell environment:

vscode vscode

Or lets install Python3 in reusable reproducible shell environment. Let's take a dummy repository (this repo contains simple cpp source code, lets build it too) for this example:

vscode vscode

As you can see from demonstation above, direnv app allow you to have tools and dependencies on the fly.

Next, lets try to build cpp app from sources. As we have a flake file in dummy repo, we have automation for build process, and we have just exec nix build

vscode vscode vscode

How-to install Trustbit Nix Data Science VM

  1. Setup terrafrom and gcloud cli apps.
  2. Setup a GCP project name in terraform variables file.
  3. Plan and apply terraform plan from your local environment.
  4. Note: after successfully applying the plan, do the following: terraform state rm google_endpoints_service.telemetry_openapi_service that is necessary for google endpoint design.

The following Google Cloud APIs will be enabled. In case of troubles you can try to enable them manually:

Important! Google APIs need up to 10 minutes to activate, please wait 10-15 minutes before next step.

In a few minutes your work VM will be available in your google cloud project and you can access it as: https://workspace.endpoints.[project_name].cloud.goog/

Good luck and enjoy a productive work environment!

About

launch a secure and flexible work environment for data scientists in a cloud


Languages

Language:Shell 57.8%Language:HCL 33.4%Language:Nix 5.5%Language:Smarty 3.3%