andreas-gompos / bbc-topic-classification

Topic classification using GloVe embeddings and a bidirectional LSTM neural network trained on BBC articles

Home Page:https://datagusto.com/models/topic_classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Topic Classifier

Description

In this project a topic classification model was created. More specifically an LSTM network was trained to distinguish between 5 different categories of articles (business, entertainment, politics, sport, tech). The dataset used, for training the network, was the BBC articles dataset, which consists of 2225 documents, from the BBC news website corresponding to stories from 2004-2005. The model is deployed on Kubernetes on GKE and can used at datagusto.com.

Train the Model

#!/bin/bash
pip install -r requirements.txt
python -m nltk.downloader stopwords punkt wordnet
python ./src/train.py --num_epochs={{ num_epochs }}
                      --top_words={{ top_words }}
                      --max_sequence_length={{ max_sequence_length }}
                      --batch_size={{ batch_size }}
                      --polyaxon_env={{ polyaxon_env }}
Parameter Description Valid Values Default
num_epochs Number of epochs to be used for training int 40
batch_size Batch size to be used for training int 256
top_words Number of most common words to be used for training (rare words will be dropped) int 35000
max_sequence_length Fixed length of each input text. The text will be padded or trimmed down to that length int 500
polyaxon_env Indicate if running in Polyaxon 0, 1 0

Any feedback is welcome! :) LinkedIn

About

Topic classification using GloVe embeddings and a bidirectional LSTM neural network trained on BBC articles

https://datagusto.com/models/topic_classification


Languages

Language:Python 95.4%Language:Shell 4.6%