SauronLee / GSR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generate Sememe Representation via Building a Large Semantic Graph

  • python==3.6.5
  • mecab-python3==1.0.4
  • scikit-learn==1.0.1
  • scipy==1.7.1
  • nltk==3.6.5
  • gensim==4.1.2
  • biterm==0.1.5
  • cpp gsl==2.4.0

Preprocessing

- cd ./preprocessing
- python ./src/download-wiki-extract.py
- bash ./src/wiki-preprocessing.sh
- python ./local_processing.py
- python ./entity_processing.py 
- python ./tokenizer.py

Building a large semantic graph

- cd ./graph_building
- python ./LSA.py
- python ./building.py
- cat word.graph doc.graph topic.graph > sememe.graph

Graph embedding

  • LINE: Large-scale information network embedding
- cd ./grapg_embedding
- ./line -train sememe.graph -output ./sememe.embedding -binary 0 -size 200 -order 2 -negative 5 -samples 100 -rho 0.025 -threads 20

About


Languages

Language:Python 44.6%Language:C++ 30.6%Language:C 19.7%Language:Jupyter Notebook 3.0%Language:Shell 1.0%Language:Makefile 0.9%Language:MATLAB 0.3%