Shuang0420 / word2vec_tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

word2vec_example

使用方法

用python jieba进行分词,训练 word2vec 模型

python train_word2vec_model.py all.txt

训练 word2vec 模型

python word2vec.py -input inputfile -output outputfile
usage: word2vec.py [-h] [-input INPUT] [-output OUTPUT] [-window WINDOW]
                   [-size SIZE] [-min_count MIN_COUNT] [-cbow_mean CBOW_MEAN]
                   [-sg SG] [-iters ITERS] [-model MODEL_NAME]
                   [-qqseg SEG_REQUIRED]

optional arguments:
  -h, --help            show this help message and exit
  -input INP            original input file name
  -output OUTPUT        output model name
  -window WINDOW        the maximum distance between the current and predicted
                        word within a sentence
  -size SIZE            the dimensionality of the feature vectors
  -min_count MIN_COUNT  ignore all words with total frequency lower than this
  -cbow_mean CBOW_MEAN  if 0, use the sum of the context word vectors. If 1
                        (default), use the mean. Only applies when cbow is
                        used.
  -sg SG                sg defines the training algorithm. By default (sg=0),
                        CBOW is used. Otherwise (sg=1), skip-gram is employed.
  -iters ITERS          number of iterations (epochs) over the corpus.
  -model MODEL_NAME     if you want to retrain the model, just give the model.
  -qqseg SEG_REQUIRED   if 0(default), assume the text is already segmented.
                        If 1, run segmentation tool first.

计算词向量的模并从大到小排序,输出word - norm of vector

python normVector.py

测试效果

python word2vec_testTool.py
enter the model you want to check guomei.model
2016-06-27 16:51:36,856: INFO: loading Word2Vec object from guomei.model
2016-06-27 16:51:36,973: INFO: setting ignored attribute syn0norm to None
2016-06-27 16:51:36,973: INFO: setting ignored attribute cum_table to None
2016-06-27 16:51:46,343: INFO: precomputing L2-norms of word weight vectors
enter the word 烹饪
烹调	0.62442278862
中餐	0.534122824669
解冻	0.48883074522
加热	0.479849278927
食用	0.470198184252
菜肴	0.466752141714
做饭	0.462529540062
炖肉	0.457395881414
熬汤	0.446689903736
慢火	0.427767693996
enter the word 订单
定单	0.778802156448
运单	0.682202041149
账号	0.611731410027
账户	0.587948918343
帐号	0.585913598537
下单	0.5846991539
单号	0.58408164978
imei	0.583274126053
退款	0.578039348125
货物	0.576778292656

About


Languages

Language:Python 100.0%