a-nap / DRTfL2024

Digital Research Toolkit for Linguists 2024

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Digital Research Toolkit for Linguists

Author: anna.pryslopska[ AT ]ling.uni-stuttgart.de

These are the original materials from the course "Digital Research Toolkit for Linguists taught by me in the Summer Semester 2024 at the University of Stuttgart. The materials will be updated weekly. Identifying information of the in-class participants will be removed, so some slides, data or exercises may be missing.

You are more than welcome to follow along but I will not be able to grade or evaluate your homework.

If you want to replicate this course, you can do so with proper attribution. To replicate the data, follow these links for Experiment 1 (full Moses illusion experiment) and Experiment 2 (demo of self-paced reading with acceptability judgment).

Course description

This seminar provides a gentle, hands-on introduction to the essential tools for quantitative research for students of linguistics and the humanities overall. During the course of the seminar, the students will familiarize themselves with software that is rarely taught but is invaluable in developing an efficient, transparent, reusable, and scalable research workflow (e.g. R basics, LaTeX, git). From text file, through data visualization, to creating beautiful reports: this course will empower students to improve their skill and help them establish good practices.

The seminar is targeted at students with little to no experience with programming. It provides key skills that are useful for research and industry jobs.

We will cover topics such as:
✔ How can I make sense of my data?
✔ How can I communicate what I found?
✔ How do I share my data and collaborate with others?

We will NOT cover topics such as:
❌ Experiment design
❌ Inferential statistics
❌ Cognitive modelling
❌ Corpus research

Schedule and syllabus

This is a rough overview of the topics discussed every week. These are subject to change, depending on how the class goes.

Week Topic Description Assignments Materials
1 Introduction & overview Course overview and expectations, classroom management and assignments/grading etc. Data collection. Complete Experiment 1 and Experiment 2 and recruit one more person. Install R and RStudio, install Texmaker or make an Overleaf account. Slides
2 Data, R and RStudio Intro recap, directories, R and RStudio, installing and loading packages, working with scripts Read chapters 2, 6 and 7 of R for Data Science, complete assignment 1 Slides, code
3 Reading data, data inspection and manipulation Looking at your data, data types, importing, making sense of the data, intro to sorting, filtering, subsetting, removing missing data, data manipulation Read chapters 3, 4 and 5 of R for Data Science, complete assignment 2. Slides, code, data
4 Data manipulation Basic operators, data manipulation (filtering, sorting, subsetting, arranging), pipelines, tidy code, practice. Compete assignment 3 Slides, code, data
5 Data manipulation and error handling Summary statistics, grouping, merging, if ... else, naming variables, tidy code, error handling and getting help. Assignment 4, read the slides from the QCBS R Workshop Series Workshop 3: Introduction to data visualisation with ggplot2 Slides, code
6 Data visualization Communicating with graphics, choice of visualization, plot types, best practices, visualizing in R (ggplot2, esquisse), exporting plots and data Complete assignment 5. If you haven't yet, install the package esquisse Slides, code
7 No class Holiday
8 Data visualization Data visualization recap, best practices, lying with plots, practical exercises, exporting/saving plots and data. Complete assignment 6. Install Quarto. Watch the introductory video Slides large and compressed, code
9 Creating reports with Quarto and knitr Pandoc, markdown, Quarto, basic syntax and elements, export, document, and chunk options, documentation Complete assignment 7. Slides, compressed Quarto files
10 Typesetting documents with LaTeX What is LaTeX, basic document and file structure, advantages and disadvantages, from R to LaTeX Complete assignment 8, read chapter 2 of The Not So Short Introduction to LaTeX. Slides, basic LaTeX file (zip)
11 Typesetting documents with LaTeX Editing text (commands, whitespace, environments, font properties, figures, and tables), glosses, IPA symbols, semantic formulae, syntactic trees Complete assignment 9, read Bibliography management with biblatex Slides
12 Typesetting documents with LaTeX and bibliography management Large projects, citations, references, bibliography styles, bib file structure Complete assignment 10 Slides, big project files
13 Literature and reference management, common command line commands Reference managers, looking up literature, command line commands (grep, diff, ping, cd, etc.) Complete assignment 11 Slides, corpus1.txt, corpus2.txt, corpus3.txt, big project 1, big project 2
14 Text editors, version control and Git Text editors, Git, GitHub, version control Complete assignment 12 Slides, example readme file
15 Version control and Git Git, GitHub, SSH, reverting to older versions In class assignment Slides, SSH for GitHub video

Recommended reading

Git

LaTeX

Quarto

R

  • QCBS R Workshop Series https://r.qcbs.ca/
  • Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023). R for data science: import, tidy, transform, visualize, and model data. 2nd ed. O’Reilly Media, Inc. URL: https://r4ds.hadley.nz/.

Experiments

  • Free-response: Erickson, Thomas D and Mark E Mattson (1981). “From words to meaning: A semantic illusion”. In: Journal of Verbal Learning and Verbal Behavior 20.5, pp. 540–551. DOI: 10.1016/s0022-5371(81)90165-1.
  • Self-paced reading with acceptability judgments: Gibson, Edward, Leon Bergen, and Steven T Piantadosi (2013). “Rational integration of noisy evidence and prior semantic expectations in sentence interpretation”. In: Proceedings of the National Academy of Sciences 110.20, pp. 8051–8056. DOI: 10.1073/pnas.1216438110.

About

Digital Research Toolkit for Linguists 2024


Languages

Language:R 100.0%