- Find a relatively simple dataset that you can work with and perform some sort of aggregation on, such as movie scores, movie reviews, sporting event stats, web server stats, etc.
- Pick a dataset with a lot of different distinct entities (different movies, candies, etc.).
- Pick a dataset with a standard numerical range as a rating or average of some field within the data.
- Pick a dataset not too small, not too large. (5k rows < num < 1m rows)
- Try to save the dataset to disk so you’re not requesting the data from an API or web site each time you run your script.
- Write a script to ingest that data from a file and save to a database. (SQLite, PostgreSQL, MySQL/MariaDB)
- Don’t worry about adding indexes at this point.
- Write a script to output basic stats about that data from the database to prove the visibility and accessibility of the data.
- Push your code to your personal GitLab repo. (call it “onboarding” or something)
- Set up linting and testing and get your build to be successful/green. (see https://gitlab.s.fpint.net/collections/bmt/blob/master/.gitlab-ci.yml and https://gitlab.s.fpint.net/collections/bmt/blob/master/prova.unit.yml )