xsthunder/emb_reduce

DONT USE THIS, use gensim instead, but emb_reduce may caching and gensim may not

Require

fire, tqdm

retrive word emb from large word emb

python ./reduce_emb/reduce_emb.py --fword wordlist --femb word_emb_file --fout reduced_emb_file

relatively small txt

each word perline

very large txt

optional wordnum:num emb_dim:num for the first line, see ./tests/Tencent_AILab_ChineseEmbedding_sample for example. glove doesnt have this line

word:num dim0 dim1 [<dim2, >] for rest

read until empty line

txt where to output, format will like golve without the first line

MIT License

Language:Jupyter Notebook 67.8%Language:Python 24.5%Language:Shell 4.3%Language:Batchfile 2.0%Language:PowerShell 1.4%