Repository of various useful resources for training
Data sources:
- TidyTuesday - https://github.com/rfordatascience/tidytuesday
- Dataset repository from information is beautiful - https://informationisbeautiful.net/data/
- Our World in Data - https://ourworldindata.org/ (also has great inspiration for data viz!)
- Data is Plural - https://www.data-is-plural.com/ - more out there datasets and interesting articles related to them
- Dataset repos on Github such as - https://github.com/awesomedata/awesome-public-datasets
- Humanitarian data exchange - https://data.humdata.org/
- London datasets - https://data.london.gov.uk/dataset?page=1 - from TFL data to the London Fire Brigade (warning, data can be messy, which can be good for learners to try and clean!)
- Makeover Monday - https://www.makeovermonday.co.uk/data/ - like TidyTuesday, data posted each week
- fivethirtyeight - https://data.fivethirtyeight.com/
- Office of National Statistics - https://www.ons.gov.uk/ - loads of datasets such as the cencus
- Clio Infra dataset repository. Clio Infra has set up a number of interconnected databases containing worldwide data on social, economic, and institutional indicators for the past five centuries, with special attention to the past 200 years. Look at the datasets on height as an example; the academic data can be found here.
Other cool datasets:
- Gender Pay Gap data - https://gender-pay-gap.service.gov.uk/viewing/download
- United Nations High Commissioner for Refugees (UNHCR) Refugee Data Finder - https://populationstatistics.github.io/refugees/. Plus their guidance page - https://dataviz.unhcr.org/general_guidance/. Plus the data dictionary of the population data - https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-08-22/readme.md
- Repository for political datasets - https://github.com/erikgahner/PolData/tree/master - readme has links to all the sources, csv or excel files have all the links to extract the datasets
- Sport data such as football data - https://github.com/JaseZiv/worldfootballR - or cricket data - https://github.com/robjhyndman/cricketdata
- Climate data - https://github.com/bczernecki/climate
- Deprivation data - https://data-communities.opendata.arcgis.com/datasets/indices-of-multiple-deprivation-imd-2019-1/explore
- Daily climate data (this is usually very hard to find!) - https://www.ecad.eu/dailydata/index.php
- Draw my data - http://robertgrantstats.co.uk/drawmydata.html - useful for making quick example datasets
- Datasaurus dozen - https://cran.r-project.org/web/packages/datasauRus/vignettes/Datasaurus.html - Useful to show how summary statistics hide the data. Cool Tableau dashboard here: https://public.tableau.com/app/profile/datasaurus.rex/viz/DatasaurusDozen_15991883507530/Main
- Eurovision song contest - https://github.com/andrewmoles2/eurovision-contest
- Pokemon game data - https://github.com/andrewmoles2/pokemon-video-games
- Price quote data from ONS with automated scraping scripts (written in R, might also have Python version too) - https://github.com/andrewmoles2/ONS-Price-Quotes
- European Social Survey (ESS) which has data on how people perceive politics and happiness in their countries.
- Guess the correlation game - https://www.guessthecorrelation.com/ - a fun way to learn what different R-squared values look like visualised. I like to make it a class-wide discussion game.
- What is going on in this graph - https://www.nytimes.com/column/whats-going-on-in-this-graph - a useful resource to help create discussions about data visualisation such as if a graph is still readable/interpretable with some labels or titles missing
- RAWGraphs is a good open-source data visualisation tool/BI tool like Tableau or PowerBi.
- HackMD is a great open-source markdown editor which allows for collaboration. It has a wide array of really nice features like graphviz and callouts too.
- Insomnia is an open-source API client that I have found helpful when using APIs for web scraping.
- Quarto is an open-source scientific and technical publishing system built around markdown and the R and Python ecosystems. It is my go-to tool for reporting and building websites.
- Libre Office is an open-source version of MS Office including equivalents for Word, Excel, PowerPoint, and Access.
- Data science in a box - https://datasciencebox.org/ & https://github.com/tidyverse/datascience-box - covers pedagogy on teaching fundamentals of data science (using R)
- Data Garden which is learning to code through creative data visualisation using JavaScript (p5.js). You can see what people have made on their project page.
- Information is beautiful awards - https://www.informationisbeautifulawards.com/showcase?award=2023&mc_cid=9d8f228bb5&type=awards - many amazing visualisations made over the years. Some even share their code if they have used R/Python/JavaScript
- https://www.data-to-viz.com/
- R bloggers - https://www.r-bloggers.com/
- Python bloggers - https://python-bloggers.com/
- https://pycoders.com/
- https://blog.djnavarro.net/
- https://www.cedricscherer.com/
- https://karaman.is/
- https://www.yan-holtz.com/blog.html
- https://albert-rapp.de/blog.html
- https://posit.co/blog/
- https://github.com/tashapiro/TidyTuesday (more example than blog!)
- Interactive explainer for various machine learning models - https://mlu-explain.github.io/
- Interactive explainer for various statistics principles - https://seeing-theory.brown.edu/index.html#firstPage
- Interactive explainer for central limit theorem - http://mfviz.com/central-limit/
- Fun interactive R course on statistics basics - https://tinystats.github.io/teacups-giraffes-and-statistics/index.html
- Amazing YouTube channel covering statistics and machine learning - https://www.youtube.com/@statquest - and his website - https://statquest.org/
- Helpful guides on how to run various statistical tests using R - https://www.r-tutor.com/
- R for Data Science (my data science bible) - https://r4ds.hadley.nz/. The first version is also great - https://r4ds.had.co.nz/
- ggplot2 book - https://ggplot2-book.org/
- Fundamentals of Data Visualisation - https://clauswilke.com/dataviz/
- Python for data science - https://aeturrell.github.io/python4DS/welcome.html
- Python for data science handbook - https://jakevdp.github.io/PythonDataScienceHandbook/
- Introduction to Statistical Learning (machine learning bible) - https://www.statlearning.com/ (now in Python too!)
- Python for Data Analysis - https://wesmckinney.com/book/
- Book of books - https://www.bigbookofr.com/index.html - has lots of resources for using and learning R