The word embedding vectors from Senna[1]. Senna's lua wrapper can be fount at [2]. This project is inspired by [3]. Note: Senna's words are all in lower case. Any number is replaced with 0.
- Torch 7
- Senna (see [1]). Needs not be compiled, just the data files that come with Senna are needed, see below.
- run command
git clone https://github.com/pengsun/senna-wordvec-torch
- cd to the directory.
- open
init.lua
- modify the two variables
opt.pathSennaWord
andopt.pathSennaVec
to the corresponding Senna files in your local machine. - modify the variable
opt.pathMyt7
, where the pre-saved t7 file will be put, to the path you like
- open
- run command
luarocks make
Then the lib will ba installed to your torch 7 directory. Delete the git-cloned source directory senna-wordvec-torch
if you like.
Takes as input a word in lua string, return the corresponding word vector. return nil
if the word is out of vocabulary.
The embedded word vector size. Should be 50.
to string.
A lua table, the word vocabulary.
A torch.FloatTensor
, the word vectors.
###Examples
local swe = require 'senna-wordvec'
print(swe)
v1 = swe:word2vec('hello')
v2 = swe:word2vec('world')
v3 = swe:word2vec('ps0') -- note: senna replace any number with 0, so `ps2` should be fed as `ps0`
v4 = swe:word2vec('0-year-old')
assert(nil == swe:word2vec('nosuchword'))
assert(swe:vec_size()==50)
###Reference [1] http://ml.nec-labs.com/senna/