beduffy / word2vec-explorer

Tool for exploring Word Vector models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word2Vec Explorer

This tool helps you visualize, query and explore Word2Vec models. Word2Vec is a deep learning technique that feeds massive amounts of text into a shallow neural net which can then be used to solve a variety of NLP and ML problems.

Word2Vec Explorer uses Gensim to list and compare vectors and it uses t-SNE to visualize a dimensional reduction of the vector space. Scikit-Learn is used for K-Means clustering.

The UI is built using React, Babel, Browserify, StandardJS, D3 and Three.js.

TSNE 10K

TSNE Labels

Vector Comparisons

Setup

To install all Python depenencies:

pip install -r requirements.txt

Usage

Load the explorer with a Word2Vec model:

./explore GoogleNews-vectors-negative300.bin

Now point your browser at localhost:8080 to load the explorer!

Obtaining Pre-Trained Models

A classic example of Word2Vec is the Google News model trained on 600M sentences: GoogleNews-vectors-negative300.bin.gz

[More pre-trained models]](https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models)

Development

In order to make changes to the user interface you will need some NPM dependencies:

npm install
npm start

The command npm start will automatically transpile and bundle any code changes in the ui/ folder. All backend code can be found in explorer.py and ./explore.

Before submitting code changes make sure all code is compliant with StandardJS as well as Pep8:

standard
pep8 --max-line-length=100 *.py explore

Todo

  • 3D GPU/WebGL view (on branch 3d)
  • Make sure axes stay when zooming/panning scatterplot
  • Autocomplete in query interface
  • Look into supporting other high dimensional data models (go beyond word vectors)
  • Drill-down of vector that shows real distance between neighbors
  • Improved sample rated view that takes into account term counts and connectedness

About

Tool for exploring Word Vector models

License:MIT License


Languages

Language:JavaScript 99.6%Language:Python 0.2%Language:CSS 0.1%Language:HTML 0.0%