vladtarko / tidy-tuesday

#TidyTuesday to practice data analysis in R

Home Page:https://github.com/rfordatascience/tidytuesday

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#TidyTuesdays

Joshua Cook 4/7/2020

Tidy Tuesday

#TidyTuesday is a tradition in R where every Tuesday, we practice our data analysis skills on a new “toy” data set.

Log

May 12, 2020 - Volcano Eruptions

data | analysis

I took this as a chance to play around with the suite of packages from ‘easystats’. Towards the end, I also experiment a bit more with mixed-effects modeling to help get a better understanding of how to interpret these models.

May 5, 2020 - Animal Crossing - New Horizons

data | analysis

I used sentiment analysis results on user reivews to model their review grade using a multivariate Bayesian model fit with the quadratic approximation. The model was pretty awful, but I was able to get some good practice at this statisticaly techinque I am still learning.

April 28, 2020 - Broadway Weekly Grosses

data | analysis

This data set was not very interesting to me as the numerical values were basically all derived from a single value, making it very difficult to avoid highly correlative covariates when modeling. Still, I got some practice at creating an interpreting miced-effects models.

April 21, 2020 - GDPR Violations

data | analysis

I used the ‘tidytext’ and ‘topicmodels’ packages to group the GDPR fines based on summaries about the violations.

April 14, 2020 - Best Rap Artists

data | analysis

I built a graph of the songs, artists, and critics using Rap song rankings.

April 7, 2020 - Tour de France

data | analysis

There was quite a lot of data and it took me too long to sort through it all. Next time, I will focus more on asking a single simple question rather than trying to understand every aspect of the data.

March 31, 2020 - Beer Production

data | analysis

I analyzed the number of breweries at varies size categoires and found a jump of very small microbreweries to higher capacity in 2018 and 2019.

March 24, 2020 - Traumatic Brain Injury

data | analysis

The data was a bit more limiting because we only had summary statistics for categorical variables, but I was able to use PCA to identify some interesting properties of the TBI sustained by the different age groups.

March 10, 2020 - College Tuition, Diversity, and Pay

data | analysis

I tried to do some classic linear modeling and mixed effects modeling, but the dat didn’t really require it. Still, I got some practice with this method and read plenty about it online during the process.

March 3, 2020 - Hockey Goals

data | analysis

I got some practice build regression models for count data by building Poisson, Negative Binomial, and Zero-Inflated Poisson regression models for estimating the effect of various game parameters on the goals scored by Alex Ovechkin.

January 21, 2020 - Spotify Songs

data | analysis

I used a random forest model to predict the genre of a playlist using musical features of their songs. I was able to play around with the ‘tidymodels’ framework.

October 15, 2019

data | analysis

I chose this old TidyTuesday dataset because I wanted to build a simple linear model using Bayesian methods. I didn’t do too much (and probably did a bit wrong), but this was a useful exercise to get to play around with the modeling.

About

#TidyTuesday to practice data analysis in R

https://github.com/rfordatascience/tidytuesday