Materials created by Judith Degen, drawing on materials by Florian Jaeger, Maureen Gillespie, Peter Graff, Dave Kleinschmidt, Roger Levy, and Victor Kuperman
Judith Degen -- jdegen@stanford.edu
The workshop is geared towards linguists interested in analyzing data from various types of experiments (truth-value judgments, Likert scale judgments, response times, reading times, etc). We'll be focusing on regression methods, in particular mixed effects models, which have proven to be a powerful tool for analyzing linguistic data (see also Harald Baayen's book "Analyzing Linguistic Data", linked below). To get the most out of it in the short amount of time, the workshop contains a large hands-on component in which participants will have the opportunity to analyze existing datasets and bring their own.
We will be using R in this course. To get the most out of it, please bring your laptop and come with R and RStudio installed.
If you have never used R before, I recommend working through chapters 1, 2, 4, and 5 of the Introduction to R on https://www.datacamp.com/home -- it sounds like a lot, but each "chapter" is actually just a few short exercises, and it'll get you used to writing basic R code.
Apart from the very first session (and food sessions...) the workshop will consist of a mix of lectures on my part interwoven with practical exercises so everyone can get their hands dirty with data after the introduction of any new concept. On the first day we'll be focusing on the general concept of regression and its simplest instantiation, (mixed effects) linear regression for continuous data (e.g., response times, reading times, slider ratings). On the second day we'll turn to logistic regression for binary data (e.g., truth-value judgment data or any other binary choice) and ordinal regression ...for ordinal data (e.g., Likert scale ratings). I also want to spend a significant amount of time on data visualization with ggplot.
I'll be adding code sheets here for participants to follow along with as I finalize them.
When | What | Where | Slides / Readings / Resources |
---|---|---|---|
Fri 10 - 10:30 | Workshop overview | Room 108 | slides |
Fri 11 - 12 | R basics and linear regression | Room 108 | slides / code / solutions |
Fri 1 - 3 | Mixed effects linear regression | Language Lab next door to Linguistics Department | slides / code |
Fri 3 - 5:30 | Individual meetings / bring your own dataset! | Language Lab or Department Basement | |
Sat 10 - 11 | Data wrangling in R | Room 108 | code |
Sat 11 - 12 | Mixed effects logistic regression | Room 108 | slides / code |
Sat 1 - 2 | Common issues in MEMs & solutions | Room 108 | slides / code |
Sat 2 - 3 | Visualizing your data: mastering ggplot | Room 108 | slides / code |
-
The Bible of mixed effects models: Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
-
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge University Press.
-
Florian Jaeger's excellent collection of resources for regression methods (code sheets, slides, pointers to further resources) on his HLPLab wiki. This includes Maureen Gillespie's tutorial on how to code your predictors to test different kinds of hypothesis.
-
Shravan Vasishth's excellent statistics lecture notes on his statistics github site
-
Andrew Ng's excellent Coursera course on Machine Learning for great video explanations of linear and logistic regression. You can also just watch the youtube videos directly, e.g. this one which explains the very basics of linear regression.
-
In class we didn't get to ordinal or multinomial regression. Here is Rune Haubo Christensen's tutorial on ordinal regression (for ordinal data like Likert Scale ratings). Here is a tutorial on multinomial regression (for unordered categorical data with more than 2 levels, like the choice between referring to a referent by a name, a pronoun, or a definite description).
-
Subscribe to the ling-R-lang list -- language researchers with R(egression) problems and solutions.
-
For help in R: try
?foo
, where foo stands for the name of a function.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language, 68(3), 255-278.
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434-446.