Machine Learning Project Checklist

The items on this checklist come from various sources, such as Machine Learning Yearning, Full Stack Deep Learning, Building Machine Learning Powered Applications, and also from my own personal experience. This is work in progress, and contributions are welcome. If you have any additions, please submit a PR to this repo.

Before modelling

The project has a clear, codified business goal/metric
There is a person who is ultimately responsible for the success/failure of the project
We have a plan for how to reach a first deployed end product as fast as possible
We have decided on how and when to keep the team in sync (daily/weekly standups, retrospectives, planning meetings, etc)
We have assessed how the product will impact stakeholders (e.g. people, society, world)
We have identified relevant regulation and translated it to requirements
We have identified requirements related to fairness, accountability, and transparency

We have selected a dev- and test set that are reflective of the real task we're trying to solve
Our dev- and test sets are from the same distribution
Our dev set is large enough, so that we can detect improvements to the desired accuracy
We understand how to split the data into train/val/test to avoid data leakage
If we need to collect data, we know how difficult and costly it will be to collect and annotate
We have a plan for how to store and version our data, dataset splits, models, and change in annotations
We get a reasonable “ML Test Score”, table 1

We have one or several well thought out baselines in place. These are not good enough, so there’s an actual need to use ML
There’s a metrics webpage where we can compare runs and the url is ___________________
We can (approximately) reproduce a model if needed
We get a reasonable “ML Test Score”, table 2