This code first crawl Korean Law data in here, and do word2vec and visualization.
Basically each laws is separated by symbol <END>
. If your text has no <END>
symbol (just a single document) it will be no problem too.
- To crawl law data, use command
python word2vec.py crawl
. python word2vec.py word2vec
will excute preprocess of data and word2vec.
You can also usepython script.py
that I made for test.- Visualization using t-SNE can perform with
python word2vec.py vis
.
For more information about arguemts, press python word2vec [crawl|word2vec|vis] --help
or see parse_argument
function in util.py
.