fcollova / fasttext

Unofficial implementation of the paper "Bag of Tricks for Efficient Text Classification" by Joulin et al.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FastText

Unofficial implementation of the paper Bag of Tricks for Efficient Text Classification by Joulin et al.

Prerequisites

FastText requires Python 3 with Keras installed.

Obtain the Yelp Dataset from here and place yelp_academic_dataset_review.json in the base directory.

Training

Train the model using the following command:

./train.py

It generates data.csv which represents the model's embedding space of the validation set. It is obtained by removing the last layer of the model and using t-SNE for the dimensionality reduction.

index.html implements a D3 visualisation to view the embedding space. You need to run a local web server because browsers don't allow file accesses:

python -m http.server 8000

Now point your browser to: localhost:8000.

License

FastText is licensed under the terms of the Apache v2.0 license.

Authors

  • Ihor Kroosh
  • Tim Nieradzik

About

Unofficial implementation of the paper "Bag of Tricks for Efficient Text Classification" by Joulin et al.


Languages

Language:TeX 55.4%Language:Python 24.0%Language:HTML 16.3%Language:Makefile 4.3%