JKHenry520/dsg

DSG

Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings

Datasets

You can also download our public datasets: https://ai.tencent.com/ailab/nlp/en/embedding.html

How to use

To build the code, simply run:

  make

The command to build word embeddings is exactly the same as in the original version, except that we removed the argument -cbow and replaced it with the argument -type:

  ./dsg -train input_file -output embedding_file -type 0 -size 50 -window 5 -negative 10 -hs 0 -sample 1e-4 -threads 1 -binary 1 -iter 5

The -type argument is a integer that defines the architecture to use. These are the possible parameters:

0 - dsg: the model proposed in this paper;
1 - simple ssg: the comparative model adopted from the original structured SG model (https://github.com/wlin12/wang2vec).

Citation

If you use functionalities in this code, please support us by citing our paper:

@InProceedings{Song:2018:naacl,
	author    = {Yan Song and Shuming Shi and Jing Li and Haisong Zhang},
	title     = "{Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings}",
	booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
	year      = {2018},
	publisher = {Association for Computational Linguistics},
	address   = {New Orleans, Louisiana, USA}
}

Many thanks!

JKHenry520 / dsg

DSG

Datasets

How to use

Citation

About

Languages