Exam portfolio
Read about the entire portfolio here»
Report Bug
·
Request Feature
Table of Contents
Example image from one of the assignments
See here for an overview of the entire portfolio.
This project contains the exam portofolio for the Spring 2021 module Language Analytics as part of the bachelor's tilvalg in Cultural Data Science at Aarhus University. This README contains all the necessary information needed to get an overview of the repository, as well the installation steps required for running the scripts in the assignments.
For running my scripts I'd recommend following the below steps in your bash-terminal. This functions as a setup of the virtual environment, as well as an execution of a bash script that downloads all the data to the data diretories respective to the assignments.
The below code will clone the repository, as well as create a virtual environment.
MAC/LINUX/WORKER02
git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv.sh
WINDOWS:
git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv_win.sh
The data is not contained within this repository, considering the sheer size of the data. Using the provided bash script data_download.sh
that I have created, the data will be downloaded from a Google Drive folder and automatically placed within the respective assignment directories.
bash data_download.sh
After cloning the repo, creating the virtual environment and retrieving the data you should be ready to go. Move to the assignment folders and read the READMEs for further instructions.
This repository has the following structure:
Column | Description |
---|---|
assignment_*/ |
Directory containing the 5 assignments |
utils/ |
Utility functions written by our instructor Ross Deans Kristensen-McLachlan, utilized in a range of the assignments. |
README_images/ |
Directory containing the few images used in the READMEs. |
report.pdf |
Document that provides a full overview of the exam project. The information contained in this document is the collated information from all READMEs. |
data_download.sh |
Bash script that installs all the necessary data. |
create_lang_venv.*.sh |
Bash scripts that automatically generates a new virtual environment, and install all the packages contained within requirements.txt . |
kill_lang_venv.sh |
Bash script that uninstalls and deletes the virtual environment. |
requirements.txt |
A list of the required packages. |
.gitignore |
A list of the files that git should ignore upon push/pulling (virtual environment and data). |
README.md |
This very README file. |
5 assignments have been chosen for this portfolio and are included within the assignment directories. Information on script execution, preprocessing steps, results and discussion can be seen in the READMEs located within each of the assignment directories.
The five assignments are:
- Assignment 3 - Sentiment analysis
- Assignment 4 - Network analysis
- Assignment 5 - (Un)supervised machine learning - LDA and Topic modeling on philosophical texts
- Assignment 6 - Text classification using Deep Learning
- Assignment 7 - LSTM models for text generation (self-assigned)
The datasets are provided by courtesy of:
- Rohit Kulkarna - Million headlines dataset, used for assignment 3
- Kourosh Alizadeh - History of Philosophy dataset, used for assginment 5
- Alben Tumanggor - Game of Thrones script dataset, used for assignment 6
- Thorben Schomacker - Grimms fairytales dataset, used for assignment 7
Feel free to write me, Emil Jessen for any questions (also regarding the reviews). You can do so on Slack or on Facebook.
- Ross Deans Kristensen-McLachlan and Kristoffer Laigaard Nielbo - Our competent instructors for the module on Language Analytics
- othneildrew (githubuser) - Providing the template that I used to create the READMEs