scarletcho / runWord2vec

Wrapper of Gensim word2vec along with T-SNE visualization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

runWord2vec

  • Gensim 라이브러리를 활용한 word2vec 훈련 및 시각화 스크립트입니다.
  • This is a wrapper of Gensim word2vec along with T-SNE visualization.

Requirements

  • Before running runWord2vec, make sure Gensim python library is installed.

  • Gensim can be easily installed by:

      $ pip install gensim
    

Data preparation

  • What to prepare:
    • A text file which has one sentence per line
    • NB. To train a set of quality word embeddings, your corpus needs to be sufficiently large.

Functionality

  • What can be done:
    • Given a text file (a corpus which has one sentence per line) in the same directory as the script, you can train your own word embeddings using the following scripts using Gensim library.

Usage

1) runWord2vec.py

  • Train & save word2vec model by the following command:

      $ python runWord2vec.py <corpus_name> <model_name>
    
  • For example:

      $ python runWord2vec.py wiki.txt mdl_wiki
    

2) runTSNE.py

  • Visualize your trained model by the following command:

      $ python runTSNE.py <model_name>
    
  • For example:

      $ python runTSNE.py mdl_wiki
    

About

Wrapper of Gensim word2vec along with T-SNE visualization


Languages

Language:Python 100.0%