emiltj / cds-language-exam

CDS Language analytics exam portfolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Logo

CDS Language Analytics

Exam portfolio
Read about the entire portfolio here»
Report Bug · Request Feature

Table of Contents
  1. About the project
  2. Getting started
  3. Repository structure
  4. Assignments
  5. Data
  6. Contact
  7. Acknowledgements

About the project

Logo

Example image from one of the assignments

See here for an overview of the entire portfolio.

This project contains the exam portofolio for the Spring 2021 module Language Analytics as part of the bachelor's tilvalg in Cultural Data Science at Aarhus University. This README contains all the necessary information needed to get an overview of the repository, as well the installation steps required for running the scripts in the assignments.

Getting started

For running my scripts I'd recommend following the below steps in your bash-terminal. This functions as a setup of the virtual environment, as well as an execution of a bash script that downloads all the data to the data diretories respective to the assignments.

Cloning repository and creating virtual environment

The below code will clone the repository, as well as create a virtual environment.

MAC/LINUX/WORKER02

git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv.sh

WINDOWS:

git clone https://github.com/emiltj/cds-language-exam.git
cd cds-language-exam
bash ./create_lang_venv_win.sh

Retrieving the data

The data is not contained within this repository, considering the sheer size of the data. Using the provided bash script data_download.sh that I have created, the data will be downloaded from a Google Drive folder and automatically placed within the respective assignment directories.

bash data_download.sh

After cloning the repo, creating the virtual environment and retrieving the data you should be ready to go. Move to the assignment folders and read the READMEs for further instructions.

Repository structure

This repository has the following structure:

Column Description
assignment_*/ Directory containing the 5 assignments
utils/ Utility functions written by our instructor Ross Deans Kristensen-McLachlan, utilized in a range of the assignments.
README_images/ Directory containing the few images used in the READMEs.
report.pdf Document that provides a full overview of the exam project. The information contained in this document is the collated information from all READMEs.
data_download.sh Bash script that installs all the necessary data.
create_lang_venv.*.sh Bash scripts that automatically generates a new virtual environment, and install all the packages contained within requirements.txt.
kill_lang_venv.sh Bash script that uninstalls and deletes the virtual environment.
requirements.txt A list of the required packages.
.gitignore A list of the files that git should ignore upon push/pulling (virtual environment and data).
README.md This very README file.

Assignments

5 assignments have been chosen for this portfolio and are included within the assignment directories. Information on script execution, preprocessing steps, results and discussion can be seen in the READMEs located within each of the assignment directories.

The five assignments are:

  • Assignment 3 - Sentiment analysis
  • Assignment 4 - Network analysis
  • Assignment 5 - (Un)supervised machine learning - LDA and Topic modeling on philosophical texts
  • Assignment 6 - Text classification using Deep Learning
  • Assignment 7 - LSTM models for text generation (self-assigned)

Data

The datasets are provided by courtesy of:

Contact

Feel free to write me, Emil Jessen for any questions (also regarding the reviews). You can do so on Slack or on Facebook.

Acknowledgements

About

CDS Language analytics exam portfolio


Languages

Language:Python 51.0%Language:HTML 47.3%Language:Shell 1.7%