roscopecoltran / word-embedding

Word2Vec, GloVe in Golang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word Embedding in Golang

Build Status GoDoc Go Report Card

This is an implementation of word embedding (also referred to as word representation) models in Golang.

Details

Word embedding makes words' meaning, structure, and concept mapping into vector space (and low dimension). For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, it could calculate word meaning by arithmetic operations between vectors.

Features

Listed models for word embedding, and checked it already implemented.

Models

  • Word2Vec
    • Distributed Representations of Words and Phrases and their Compositionality [pdf]
  • GloVe
    • GloVe: Global Vectors for Word Representation [pdf]
  • SPPMI-SVD
    • Neural Word Embedding as Implicit Matrix Factorization [pdf]

Installation

$ go get -u github.com/roscopecoltran/word-embedding
$ bin/word-embedding -h

Demo

Downloading text8 corpus, and training by Skip-Gram with negative sampling.

$ sh demo.sh

Usage

The tools embedding words into vector space

Usage:
  word-embedding [flags]
  word-embedding [command]

Available Commands:
  sim         Estimate the similarity between words
  word2vec    Embed words using word2vec

File I/O

  • Input
    • Given a text is composed of one-sentence per one-line, ideally.
  • Output
    • Output a file is like libsvm format:
    <word> <index1>:<value1> <index2>:<value2> ...
    

References

  • Just see it for more deep comprehension:
    • Improving Distributional Similarity with Lessons Learned from Word Embeddings [pdf]
    • Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors [pdf]

About

Word2Vec, GloVe in Golang

License:Apache License 2.0


Languages

Language:Go 97.5%Language:Makefile 1.9%Language:Shell 0.6%