embeddings.js

Word embeddings for the the web

Blog post: https://mb-14.github.io/tech/2019/02/19/word-embeddings-js.html

Word embeddings often require a large number of parameters which results in a large memory and storage footprint. This makes deploying pre-trained word embeddings like fastText and GloVe in mobile and browser environments very difficult. In this project, we will compress pre-trained word vectors using simple post-processing techniques like PCA dimensionality reduction and production quantization. The resulting embeddings are significantly smaller compared to the original embeddings with no considerable drop in accuracy. The final vectors along with the helper methods to access them are bundled into a javascript library. The library uses tensorflowjs to decode the word embeddings and perform general purpose operations on it. To speed up inferencing, we set the runtime backend to wasm for accelerated CPU calculations at near native speed.

Demo

You can check out the demo of the js library on this page: https://mb-14.github.io/embeddings.js

Models

compressor - Module to compress pretrained word embeddings using PCA and product quantization
sentiment_classification - LSTM model for sentiment classifcation trained on the sentiment140 dataset

Instructions

This project uses yarn for dependencies

Run on local

yarn 
yarn run demo

You can then check all the demos at http://localhost:8080

Re-build models

yarn build

About

Word embeddings for the web

https://mb-14.github.io/embeddings.js/

nlp fasttext word-embeddings

Languages

Language:JavaScript 39.2%Language:HTML 30.8%Language:Python 28.6%Language:Shell 1.4%