Bubblbu / sustainable-data-sci

Exploring practices for sustainable (small) data science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sustainable (Small) Data Science

Exploring practices for sustainable (small) data science.

This document was developed as part of an IT Workshop for the Stanford Graduate School of Education.

Useful tips, tools, and processes

The following list provides an opiniated overview of useful tools and processes that can help to setup a sustainable small data science project. The assumptions are that individual researchers or small teams will be responisble for the various kinds of tasks that are involved in these kinds of undertakings.

These tasks are categorized by three distinct project phases which usually involve different types of considerations and engagements with collaborators, readers, and users.

  1. Initiation phase: Setting up project/data structures
  2. Ongoing phase: Development and analysis
  3. Completion phase: Publishing and dissemination

Initiation phase

Project structure

  1. Cookiecutter template
  2. My own project structure (provide examples)
    1. Data, notebooks, scripts, outputs
  3. Separate data work from scholarly articles
    1. Usual published project structure:
      1. Versioned code repository with DOI (Github + Zenodo)
      2. Versioned data repository with DOI (Dataverse)
      3. Versioned repository with code/data to reproduce article (Github + Zenodo)

Ongoing phase

Utility

  1. Progress bars: tqdm
  2. APIs: Postman

Development

  1. Dependency management: Poetry, pyenv, pipx
  2. Leverage notebooks and interactive development environments 5. I can provide a deep dive into my local dev setup. But maybe not relevant for non-pythonistas
  3. Serious development in notebooks: nbdev

Research process

  1. Github Wiki as a research log

Completion phase

Collaboration

  1. Turn notebooks into slides: RISE
  2. Github Pages

About

Exploring practices for sustainable (small) data science

License:Creative Commons Zero v1.0 Universal


Languages

Language:HTML 98.6%Language:Jupyter Notebook 1.4%