AndreasMadsen / bachelor-code

The code for my B.Sc.Eng. thesis work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bachelor Thesis (Code)

By: Andreas Madsen

Download

git clone https://github.com/AndreasMadsen/bachelor-code.git code

Run code

check the run directory for executabel scripts. Note that the articles aren’t inclided as I don’t have the interlectual property rights to share them.

Dataset format

There are two datasets, dataset/data/news.full.json.gz and dataset/data/news.json.gz. They contain pretty much the same thing, but some methods only used a subset of the article text, thus they used the news.json.gz file.

The format is gziped newline seperated json strings. The JSON format is:

{
  "title": /* title as unicode text, shouldn't contain newlines */,
  "text": /* main text as unicode text, may contain newlines */,
  "website": /* website index, arbitrary number */,
  "date": /* unix timestamp in ms */,
  "href": /* http url */
}

About

The code for my B.Sc.Eng. thesis work

License:MIT License


Languages

Language:Python 85.0%Language:JavaScript 10.6%Language:HTML 2.2%Language:CSS 1.7%Language:Shell 0.5%Language:Makefile 0.0%