stephbuon / faha-2023

Code for "Foundations and Applications of Humanities Analytics" (2023) at the Santa Fe Institute

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

faha-2023

The FAHA institute provides online and in-person education aimed at a broad range of humanities scholars. Participants will gain a theoretical and practical understanding of text analysis methods, and will learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship.

This repository contains code and links to instructional videos for the summer workshop.

Instructional Videos

Code

Notebook Colab Description
quick_start.ipynb View An introduction to Google Colab. This notebook also demonstrates how to import our data.
counting_words.ipynb View This Notebook covers some basics of processing text with Python. It invites readers to count words and visualize their results while thinking critically about how the way in which we process text can impact analysis.
this_is_not_a_string.ipynb View This Notebook provides a quick walkthrough of data structures, data types, and common errors. The purpose of this Notebook is to help cultivate an awareness of how our computer processes digital data compared to how we might perceive that same data.
word_emebeddings.ipynb View Word embeddings can provide insight into different dimensions of a corpus. Here we use word embeddings to view, at scale, which words are most associated with one another and how these associations changed over time. (see: "Text Mining as Historical Method" for the original version.)
topic_modeling.ipynb View Code for modeling topics.

Get a Copy

  • Click the green "code" button (top right corner) and "Download Zip"

Or

  • Clone the repository via terminal: git clone https://github.com/stephbuon/faha.git

Data

Hansard:

Congress:

Reddit:

Buffy Fanfic

CauseNet

Loudoun County School Board Minutes

Gutenberg Poetry

Additional Resources

GitHub Repositories

Data

Data Citations:

Buongiorno, Steph, Robert Kalescky, Omar Alexander Cerpa, and Jo Guldi. "The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names: Parsed Debates, N-Gram Counts, Special Vocabulary, Collocates, and Topics", https://doi.org/10.7910/DVN/ZCYJH8, Harvard Dataverse, V1, 2022, UNF:6:wFlN6+URD9Q9BWYxgZgu1A== [fileUNF]

About

Code for "Foundations and Applications of Humanities Analytics" (2023) at the Santa Fe Institute

License:MIT License


Languages

Language:Jupyter Notebook 100.0%