sdam-au / PIA

This repository contains scripts associated with the PIA project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PIA - article supplementary


Purpose

This repository serves as a supplementary material for the article "Pain and the Body in Corpus Hippocraticum: A Distributional Semantic Analysis", currently under review (May 2021). It contains scripts, data and figures. The scripts are in Python 3 programming language and have a form of Jupyter notebooks. All our analyses aim at being fully reproducible and we invite other scholars to reuse our code and data for their analyses.


Authors

  • Vojtěch Kaše
  • Vojtěch Linka

License

CC-BY-SA 4.0, see attached License.md


How to use this repository

  • download or clone the repository
  • activate the virtual environment (open your command line, move the the repository folder and run bash ./create_pia_venv.sh)
  • in the jupyter notebooks, always check that you are connected to the pia_venv kernel
  • (alternatively, if you do not wish to use the virtual environment, make sure that you have installed all required python packages within the requirements.txt file: pip install -r requiremnts.txt

Jupyter notebooks

The scripts are in the scripts subfolder and their numbers and titles should be self-explanatory:

  • 1_EXTRACTING-CORPORA.ipynb extracts relevant texts from a large corpus of ancient Greek texts (LAGT).
  • 2_EXPLORATIONS+REPLACEMENTS.ipynb makes some preliminary observations and employs regular expression to extract all pain words unified under four word roots: πόνο*, ὀδύν*, ἄλγ*, λύπ*.
  • 3_OVERVIEW+WORK-DISTANCES.ipynb (1) offers a detailed overview of frequecies of the pain words across individual works and work categories; (2) generates plots visualizing distances between works based on their shared vocabulary.
  • 4_PAIN-SENTENCES.ipynb analyzes sentences comparing the pain words, comparing them agaist the rest of the corpus.
  • 5_VECTORS.ipynb introduces a vector (or: distributional) semantic model. It construts a weighted word-word co-occurrence matrix using the PPMI3 metric, transforms it by SVD and compares row vectors corresponding to individual words by means of cosine similarity. Finally, it plots the embeddings projected into a 2-dimensional space by tSNE.
  • 5_VECTORS_without-de-diaeta.ipynb is the same script as the previous one, with one difference: the work De diaeta is excluded from the analysis.

About

This repository contains scripts associated with the PIA project.

License:Creative Commons Attribution Share Alike 4.0 International


Languages

Language:Jupyter Notebook 100.0%Language:Shell 0.0%