vyraun / text-classification-wv

Text Classification Using Word Vectors.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WordVectors

Text Classification with Word Vectors

Download Pre-trained Vectors

Install

  • pip install pandas sklearn gensim tensorflow keras bs4
  • nltk.download('punkt')
  • nltk.download('stopwords')

Get Reduced Vectors from Pre-trained vectors

  • python reduction_algo.py [embedding_file] [reduced_dimensions] (e.g. python reduction_algo glove.300d.txt 150)
  • e.g. python reduction_algo glove.300d.txt 150 --> the reduced embeddings will be saved in reduced_embeddings_150.txt

20Newsgroup on Pretrained-Glove

  • svc.py creates the document vectors + gives results
  • e.g. python svc.py glove.300d.txt 300

Reuters on Pretrained Glove

  • svc_reuters.py creates and evaluates the document vectors
  • e.g. python svc_reuters.py glove.300d.txt 300

Train Vectors Using Word2Vec Model

  • Run the Word2VecModel_on_Newsgroup.py and Word2VecModel_on_Reuters.py files
  • Embedding files will be created, use them just as pre-trained vectors for evaluation
  • e.g Word2VecModel_Newsgroup.py 400 -> created embedding_on_newsgroup_400.txt

Evaluation Table

Embedding 20Newsgroup Reuters
Glove-300D 60
Glove-200D 53
Glove-100D 50
Glove-Reduced-150D 51
Glove-Reduced-100D 42
Glove-Reduced-50D 36
Fasttext-300D
Fasttext-Reduced-150D
Word2Vec-300D
Word2Vec-Reduced-150D
W2V-Newsgroup-300D 73 (0.7379182156133829) x
W2V-Newsgroup-200D 0.6736590546999469 x
W2V-Newsgroup-400D 0.7124269782262347 x
W2V-Newsgroup-Reduced-150D 60 (0.6023632501327668) x
W2V-Newsgroup-Reduced-100D x
W2V-Newsgroup-Reduced-200D 0.6427243759957515 x
W2V-Reuters-300D x 41 (0.4121083377588954)
W2V-Reuters-200D x
W2V-Reuters-100D x
W2V-Reuters-Reduced-150D x 32 (0.3252788104089219)
W2V-Reuters-Reduced-100D x
W2V-Reuters-Reduced-50D x

About

Text Classification Using Word Vectors.


Languages

Language:Python 100.0%