A tiny appetizer for using Git/GitHub in social science research
What is Git?
- Git is a revision control system
- allows you to track changes to any text-based files (e.g., R code, Stata code, LaTeX documents, raw text files, CSV files, etc.)
- being a distributed system, it encourages collaboration on software projects
- typically used for software development and other version control tasks
- created by Linus Torvalds in 2005 (yes, THE Linus Torvalds!)
- typically used from the command line
What is GitHub?
- web service "on top of Git" that allows you to host Git repositories
- offers functionality of Git as well as other features
- as of April 2016, more than 14 million users and more than 35 million repositories --> largest host of source code in the world
- browser functionality with social-networking-like features
- desktop client for all platforms, which facilitates using Git a lot
- if you host your repositories publicly, it's free
- if you want to host (parts of) them privately, you have to pay for it (unless you're a student); see https://github.com/pricing
- you can simply use the service as an online repository for your code
- you can follow other code projects ("fork" them), e.g., packages, to monitor but also suggest changes
- you can file issues to other projects ("Function x of your package does not work for me, is there a bug?", "I have a suggestion of how to improve your package")
Why should I use Git/GitHub for my research?
- easy way to publish code/appendices/additional results, independent of a publisher
- present work in progress before article is published
- make your research reproducible; this ideally means that you publish the entire project, including original data and all the code (both cleaning and analysis code)
- do better version control, no longer
analysis_survey_final_revised.R
or keep old files an a designated folder - retain full control over the availability of your data
- host your own website
- indirectly by using latest versions of statistical software---much of the interesting work in the R community is published on GitHub even before packages get finally published on CRAN
Some examples
- https://github.com/simonmunzert/hitler-speeches (materials for a current working paper)
- https://github.com/simonmunzert/rscraping-jsm-2016/ (materials for a course I once gave - check out "Graphs" and "Network")
- https://github.com/hadley/dplyr (slightly more activity here)
- https://github.com/zuphilip
Useful Resources to get started with Git/GitHub
- How I Use Git and GitHub for Political Science Research (Carlisle Rainey)
- Git/GitHub, Transparency, and Legitimacy in Quantitative Research (Zachary M. Jones, see also here for the Political Methodologist preprint version
- The Pro Git Book, a comprehensive Git guide (Scott Chacon and Ben Straub)
- Try Git (free hands-on course about git)