tharangni / uva-dl-nlt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document Classification

This reposity contains code for the Documentation Classification task on the Reuters Dataset, for the course Deep Learning for Natural Language Technology (2018) taught by Christoph Monz.

Team:

  • Masoumeh Bakhtiariziabari
  • Samarth Bhargav
  • Gulfaraz Rahman
  • Tharangni Harsha Sivaji
  • Ece Takmaz

Models

Tf-idf

To run:

chmod +x run_tfidf.sh
./run_tfidf.sh

LDA

NUM_TOPICS=10
python run.py train --data-root ./data/reuters/ --model lda --model-id lda_$NUM_TOPICS --num-topics $NUM_TOPICS

GloVe

chmod +x run_embedding_glove.sh
./run_embedding_glove.sh

NER

# for the NER model only
python runPN.py train --data-root ./data/reuters/ --model ner-model

# for the NER model combined with a word LSTM
python runPN.py train --data-root ./data/reuters/ --model ner-comb-model

Doc2Vec

python run.py train --data-root ./data/reuters/ --model doc2vec --model-id doc2vec

LSTM

chmod +x run_simple_deep.sh
./run_simple_deep.sh

HAN

* For training:
- For changing the hyper params, experiment number and result path go to "models/HAN.py". Then edit them in "__init__" of "class hanTrainer"
- For run:
python run_han.py train --data-root ./data/reuters --model han --epochs 200

* For getting the statistics:
- For changing the path of saved model go to "eval_han.py". Then edit "sent_model_path" and "word_model_path" to the related path.
- For running the evaluation:
python eval_han.py train --data-root ./data/reuters --model han

About


Languages

Language:Jupyter Notebook 54.4%Language:Python 44.6%Language:Shell 1.0%