lxc-xx / doc2vec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The repository contains some python scripts for training and inferring test document vectors using paragraph vectors or doc2vec.

Requirements

  • Gensim: If you need to load pre-trained word embeddings when training doc2vec, check out my forked version of Gensim; if not feel free to use the canonical one

Pre-Trained Doc2Vec Models

Pre-Trained Word2Vec Models

For reproducibility we also released the pre-trained word2vec skip-gram models on Wikipedia and AP News:

Directory Structure and Files

  • train_model.py: example python script to train some toy data
  • infer_test.py: example python script to infer test document vectors using trained model
  • toy_data: directory containing some toy train/test documents and pre-trained word embeddings

Publications

About

License:Apache License 2.0


Languages

Language:Python 100.0%