xsrust / ttds_group

Text Technologies for Data Science Group Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[INFR11145] Text Technologies for Data Science Group Project 2017-18

This repo contains the code and some of the data used for our TTDS Movie Genre Classification using Subtitles group project. The repo also contains our final report.

There are three main folders:

  • jsons contains two jsons created using Scrapy that link movies titles with their IMDb ID and genre(s). There is another json inside the classification folder that contains both of these joined.
  • subtitles-module contains the code used for the data collection and data processing part of this project. For more information on this check the readme in the subtitles-module folder.
  • classification contains the code used for the classification task. For more information on this check the readme on this folder. For more information on this please check the readme in the classification folder.

Contributors:

About

Text Technologies for Data Science Group Project


Languages

Language:Python 78.3%Language:Perl 6 14.5%Language:Perl 7.2%