This is an implementation of word embedding (also referred to as word representation) models in Golang.
Word embedding makes words' meaning, structure, and concept mapping into vector space (and low dimension). For representative instance:
Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")
Like this example, it could calculate word meaning by arithmetic operations between vectors.
Listed models for word embedding, and checked it already implemented.
- Word2Vec
- Distributed Representations of Words and Phrases and their Compositionality [pdf]
- GloVe
- GloVe: Global Vectors for Word Representation [pdf]
- SPPMI-SVD
- Neural Word Embedding as Implicit Matrix Factorization [pdf]
$ go get -u github.com/roscopecoltran/word-embedding
$ bin/word-embedding -h
Downloading text8 corpus, and training by Skip-Gram with negative sampling.
$ sh demo.sh
The tools embedding words into vector space
Usage:
word-embedding [flags]
word-embedding [command]
Available Commands:
sim Estimate the similarity between words
word2vec Embed words using word2vec
- Input
- Given a text is composed of one-sentence per one-line, ideally.
- Output
- Output a file is like libsvm format:
<word> <index1>:<value1> <index2>:<value2> ...