ksugawara61 / Word2VecSample

This repository is a script of sample dictionary for mecab

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word2Vec sample script

Requirements

  • python 3.7
  • gensim
$ pip install -U gensim

Scripts

  • モデル作成
    • word2vec.Text8Corpus で文章をコーパスにする
    • word2vec.Word2Vec でベクトルを作成する
$ /usr/local/opt/python/bin/python3.7 bin/make_model.py sentence/ichiro.txt
  • ベクトル抽出
$ /usr/local/opt/python/bin/python3.7 bin/get_wordvector.py イチロー
イチロー
[-0.80617577  1.256171   -2.327676    1.6579907   1.065178   -1.1967746
  0.7361687  -0.17460112  4.0770187   0.16385438  2.5888689   0.12413041
 -0.56640697 -0.6104689   1.8544087  -1.6985633   1.0587243  -1.4851463
 -0.318925    0.6570676   2.6269774   4.4174986   0.01001746 -2.068054
  1.4395214   0.3536166  -0.55543566 -0.9476054   2.5045435  -1.4728755
  3.1830812  -0.48830637  1.4631358   2.2503517  -2.1562998  -0.51378345
  4.5562367  -1.9496444  -1.0046719  -1.2187924   3.0221648  -2.742949
 -1.2805495  -0.9590431  -0.6877336  -0.3465123   2.6909263  -0.37924847
 -0.20221949 -1.1235477  -1.290373    0.2820788  -2.2922645  -1.722206
  0.51458406 -2.7630925  -2.8609605  -1.7121651   2.1336074   1.3305161
  0.08751959  3.31309     2.6296997   1.0125283  -2.1882057  -0.48854506
 -2.2838588  -1.4194442  -3.3133607   2.0754921   0.867208    3.6263661
 -0.33565846 -0.06962128 -0.44305894 -0.7599756   2.792378   -0.65954375
  0.64455354  3.0353553   0.3648993   1.7940687  -0.4251959   0.5222359
 -1.5341059   0.4533545  -1.908346   -0.62840855 -0.9165895   1.6577936
 -0.41989294 -1.8090215   1.3254014  -3.0771458  -1.8452297  -3.0585485
  0.08236121  1.3783338  -1.5098305   0.38861656  1.4993546  -0.6297144
 -1.6514739   4.6196637  -0.86678445  0.48862758 -0.96667373 -1.4577236
 -1.6874688  -1.2069464   0.7432712   1.3374639   0.16684115 -0.04605089
 -1.241104   -0.8367395   0.1699877  -0.4882158   1.2423984  -1.4079136
 -1.9810827   0.58272225  1.8964049  -0.11146958  2.284768   -4.3190775
 -0.45907325 -4.0285616   2.2347713   1.1211501   0.96530694 -1.9227403
  0.3578294  -0.5690398  -0.6228453   1.2842722  -1.6695601  -3.3771067
  0.15552217  0.32800284  2.5918634   0.95849663  0.33007133 -0.3705334
 -0.7646333   1.6597317   1.7913157   1.0197343   0.26756915  1.0328467
 -0.610417    3.0729988  -2.166554    2.659004   -2.429619    0.6689142
 -2.0694318   1.7173177   1.6512605   0.28239796 -0.7016835   0.42853072
 -1.199906    1.5888785  -1.6298378  -0.02496901 -0.00948336 -0.94385445
  0.04049189  1.1296403   0.74378115  1.5067255   3.1072576   0.21440656
 -2.1364744  -1.5829406  -0.90442395  2.766479    0.90474373  1.3611627
 -1.6251477  -0.11540962 -0.18581623  1.6183581   2.020488    0.33838025
  3.9061773  -0.3172203  -2.6684396  -2.3360102   2.7565582  -0.17157906
  0.2816242   0.8497625   0.52398145 -0.44943038 -2.4007018   3.107737
 -0.6295846  -1.1016927 ]
  • 固有表現抽出
$ /usr/local/opt/python/bin/python3.7 bin/get_wordlist.py | tail -n 10
TANF
グレートエンクロージャー
エクストリーム・アイロニング
ロナルディア
アブラーム
オーバーヴェーザー
アウセンヴェーザー
エディルレイド
スパーロック
ワターソン
  • 関連ワード抽出
$ /usr/local/opt/python/bin/python3.7 bin/get_similar_wordlist.py イチロー
('ピッチャー', 0.6882702112197876)
('長嶋', 0.6860312819480896)
('グリフィー', 0.675804078578949)
('デレク・ジーター', 0.66254723072052)
('タフィ・ローズ', 0.6598186492919922)
('バッティング', 0.6514485478401184)
('稲尾', 0.6486073732376099)
('門田', 0.6470164656639099)
('強打者', 0.6457992196083069)
('大リーグ', 0.6383268237113953)

About

This repository is a script of sample dictionary for mecab


Languages

Language:Python 100.0%