trungtrinh44 / vn_word_vector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vietnamese Word Vector

  • The pretrained word vectors are available here (using HCMUT email to access).
  • There are 2 types of word vector: unigram and segmented.
  • Each of them is trained using 3 methods: CBOW, Skip-gram and GloVe
  • To use the unigram word vector, use the clean_text function in preprocess.py to prepare the texts before mapping them to word vectors. (require regex python package).
  • To use the segmented version, use VnCoreNLP for word segmentation before mapping them to word vectors.

About


Languages

Language:Python 100.0%