mb-14 / embeddings.js

Word embeddings for the web

Home Page:https://mb-14.github.io/embeddings.js/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

embeddings.js

Word embeddings for the the web

Blog post: https://mb-14.github.io/tech/2019/02/19/word-embeddings-js.html

Word embeddings often require a large number of parameters which results in a large memory and storage footprint. This makes deploying pre-trained word embeddings like fastText and GloVe in mobile and browser environments very difficult. In this project, we will compress pre-trained word vectors using simple post-processing techniques like PCA dimensionality reduction and production quantization. The resulting embeddings are significantly smaller compared to the original embeddings with no considerable drop in accuracy. The final vectors along with the helper methods to access them are bundled into a javascript library. The library uses tensorflowjs to decode the word embeddings and perform general purpose operations on it. To speed up inferencing, we set the runtime backend to wasm for accelerated CPU calculations at near native speed.

Demo

You can check out the demo of the js library on this page: https://mb-14.github.io/embeddings.js

Models

  • compressor - Module to compress pretrained word embeddings using PCA and product quantization
  • sentiment_classification - LSTM model for sentiment classifcation trained on the sentiment140 dataset

Instructions

This project uses yarn for dependencies

Run on local

yarn 
yarn run demo

You can then check all the demos at http://localhost:8080

Re-build models

yarn build

About

Word embeddings for the web

https://mb-14.github.io/embeddings.js/


Languages

Language:JavaScript 39.2%Language:HTML 30.8%Language:Python 28.6%Language:Shell 1.4%