This is the git repository for Computational Thinking for Social Scientists. This book intends to help social scientists to think computationally and develop proficiency with computational tools and techniques, necessary to conduct research in computational social science. Mastering these tools and techniques not only enables social scientists to collect, wrangle, analyze, and interpret data with less pain and more fun, but it also let them to work on research projects that would previously seem impossible.
The book is currently divided into two main subjects (fundamentals and applications) and six main sessions.
-
Best practices in data and code management using Git and Bash
-
How to use functions to automate repeated things and develop data tools (e.g., packages and apps)
-
How to collect and parse semi-structured data at scale (e.g., using APIs and webscraping)
-
How to analyze high-dimensional data (e.g., text) using machine learning
-
How to access, query, and manage big data using SQL and Spark
Please feel free to create issues if you find typos, errors, missing citations, etc via the GitHub repository associated with this book.
Content developer: Jae Yeon Kim: jaeyeonkim@berkeley.edu
This book is collected as much as it is authored. It is a remix version of PS239T, a graduate-level computational methods course at UC Berkeley, originally developed by Rochelle Terman then revised by Rachel Bernhard. I have taught PS239T as lead instructor in Spring 2019 and TA in Spring 2018 and will co-teach it in Spring 2020. Other teaching materials draw from the workshops I have created for D-Lab and Data Science Discovery Program at UC Berkeley. I also have cited all the other references whenever I am aware of related books, articles, slides, blog posts, or YouTube video clips.
This work is licensed under a Creative Commons Attribution 4.0 International License.