tibor-mach / example-rnd-monorepo

DVC & DVCLive for R&D projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Example for R&D repository structure

Installation

python -m venv .venv
echo "PYTHONPATH=$PWD" >> .venv/bin/activate
source .venv/bin/activate
pip install -r requirements.txt

Run

1 - Study bio-1023

Example of workflow using Jupyter Notebooks & DVCLive

Workflow:

  • Navigate to the study directory: cd bio-1023
  • Run code in JN: jupyter lab
  • Commit & push results with Git and DVC

2 - Study gen-2024

Common DVC pipeline with multiple stages

graph TD;
    data --> train_rf["Random Forest"];
    data --> train_lr["Linear Regression"];
    train_rf --> evaluate;
    train_lr --> evaluate;
Loading

Workflow:

  • Navigate to the study directory: cd gen-2024
  • Run the pipeline: dvc exp run
  • Commit & push results with Git and DVC

Collaboration workflow

1 - Setup DVC remote storage

dvc add remote gcp gs://my-bucket

2 - Push artifacts to remote storage

dvc push 

3 - Pull artifacts from remote storage

dvc pull

3.1 - Pull specific artifact (alternative ways)

dvc pull - when you are inside the repository

Download tracked files or directories from remote storage based on the current dvc.yaml and .dvc files, and make them visible in the workspace.

git checkout a972308              # Commit created after DVC Remote setup
dvc pull bio-1023/data/features.csv

dvc get - when you are outside the repository

Download a file or directory tracked by DVC or by Git into the current working directory.

dvc get https://github.com/mnrozhkov/example-rnd-monorepo \
    bio-1023/data/features.csv \
    -o bio-1023/data/features.csv \
    --rev a972308

dvc artifacts get - by registered aritfact name & version

dvc artifacts get https://github.com/mnrozhkov/example-rnd-monorepo \
    bio-1023:data-bio-1023 \
    -o bio-1023/data/features.csv \
    --rev v2.0.1

About

DVC & DVCLive for R&D projects


Languages

Language:Jupyter Notebook 96.3%Language:Python 3.7%