Exploring practices for sustainable (small) data science.
This document was developed as part of an IT Workshop for the Stanford Graduate School of Education.
The following list provides an opiniated overview of useful tools and processes that can help to setup a sustainable small data science project. The assumptions are that individual researchers or small teams will be responisble for the various kinds of tasks that are involved in these kinds of undertakings.
These tasks are categorized by three distinct project phases which usually involve different types of considerations and engagements with collaborators, readers, and users.
- Initiation phase: Setting up project/data structures
- Ongoing phase: Development and analysis
- Completion phase: Publishing and dissemination
- Cookiecutter template
- My own project structure (provide examples)
- Data, notebooks, scripts, outputs
- Separate data work from scholarly articles
- Usual published project structure:
- Versioned code repository with DOI (Github + Zenodo)
- Versioned data repository with DOI (Dataverse)
- Versioned repository with code/data to reproduce article (Github + Zenodo)
- Usual published project structure:
- Dependency management: Poetry, pyenv, pipx
- Leverage notebooks and interactive development environments 5. I can provide a deep dive into my local dev setup. But maybe not relevant for non-pythonistas
- Serious development in notebooks: nbdev
- Github Wiki as a research log
- Turn notebooks into slides: RISE
- Github Pages