CDS Language Analytics

Exam portfolio
Read about the entire portfolio here»
Report Bug · Request Feature

Table of Contents

About the project
Getting started
Repository structure
Assignments
Data
Contact
Acknowledgements

About the project

Example image from one of the assignments

See here for an overview of the entire portfolio.

This project contains the exam portofolio for the Spring 2021 module Language Analytics as part of the bachelor's tilvalg in Cultural Data Science at Aarhus University. This README contains all the necessary information needed to get an overview of the repository, as well the installation steps required for running the scripts in the assignments.

Getting started

For running my scripts I'd recommend following the below steps in your bash-terminal. This functions as a setup of the virtual environment, as well as an execution of a bash script that downloads all the data to the data diretories respective to the assignments.

Cloning repository and creating virtual environment

The below code will clone the repository, as well as create a virtual environment.

MAC/LINUX/WORKER02

git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv.sh

WINDOWS:

git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv_win.sh

Retrieving the data

The data is not contained within this repository, considering the sheer size of the data. Using the provided bash script data_download.sh that I have created, the data will be downloaded from a Google Drive folder and automatically placed within the respective assignment directories.

bash data_download.sh

After cloning the repo, creating the virtual environment and retrieving the data you should be ready to go. Move to the assignment folders and read the READMEs for further instructions.

Repository structure

This repository has the following structure:

Column	Description
`assignment_*/`	Directory containing the 5 assignments
`utils/`	Utility functions written by our instructor Ross Deans Kristensen-McLachlan, utilized in a range of the assignments.
`README_images/`	Directory containing the few images used in the READMEs.
`report.pdf`	Document that provides a full overview of the exam project. The information contained in this document is the collated information from all READMEs.
`data_download.sh`	Bash script that installs all the necessary data.
`create_lang_venv.*.sh`	Bash scripts that automatically generates a new virtual environment, and install all the packages contained within `requirements.txt`.
`kill_lang_venv.sh`	Bash script that uninstalls and deletes the virtual environment.
`requirements.txt`	A list of the required packages.
`.gitignore`	A list of the files that git should ignore upon push/pulling (virtual environment and data).
`README.md`	This very README file.

Assignments

5 assignments have been chosen for this portfolio and are included within the assignment directories. Information on script execution, preprocessing steps, results and discussion can be seen in the READMEs located within each of the assignment directories.

The five assignments are:

Assignment 3 - Sentiment analysis
Assignment 4 - Network analysis
Assignment 5 - (Un)supervised machine learning - LDA and Topic modeling on philosophical texts
Assignment 6 - Text classification using Deep Learning
Assignment 7 - LSTM models for text generation (self-assigned)

Data

The datasets are provided by courtesy of:

Rohit Kulkarna - Million headlines dataset, used for assignment 3
Kourosh Alizadeh - History of Philosophy dataset, used for assginment 5
Alben Tumanggor - Game of Thrones script dataset, used for assignment 6
Thorben Schomacker - Grimms fairytales dataset, used for assignment 7

Contact

Feel free to write me, Emil Jessen for any questions (also regarding the reviews). You can do so on Slack or on Facebook.

Acknowledgements

Ross Deans Kristensen-McLachlan and Kristoffer Laigaard Nielbo - Our competent instructors for the module on Language Analytics
othneildrew (githubuser) - Providing the template that I used to create the READMEs

emiltj / cds-language-exam