impresso / PH-passim-tutorial

Code and data accompanying the Programming Historian tutorial on text reuse with Passim by Romanello & Hengchen.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

Binder

This repository contains the sample data for the Programming Historian's lesson Detecting Text Reuse with Passim, written by Matteo Romanello and Simon Hengchen (currently in preparation).

Data come from two different sources (see respective READMEs for license statements and further details):

  1. books from EEBO (Early English Books Online) → more info
  2. newspaper articles from impressomore info

The Jupyter notebook explore-passim-output.ipynb contains an example of how to load passim's JSON output into a pandas DataFrame to compute some statistics.

To run the notebook as well as the script eebo/code/main.py make sure that you install the required dependencies into a new virtual environment (created by using conda, pyenv, venv, etc.):

pip install -r requirements.txt

About

Code and data accompanying the Programming Historian tutorial on text reuse with Passim by Romanello & Hengchen.


Languages

Language:Jupyter Notebook 96.0%Language:Python 4.0%