seungwookim / sequence_tagging

Sequence Tagging with Tensorflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

한글 NER 테스트 (BI-LSTM CRF , Attention Seq2Seq)

  • glove vector lib download needed.
    . locate txt file under data/glove.6B/
  • glove vector Data file (download)
wget http://nlp.stanford.edu/data/glove.6B.zip
  • step1 : make it work on python3.5 and tf1.1 (done)
  • step2 : test korean with mecab tockenizer (done)
  • step3 : develop whole process on hoyai project with rest webservice
    . API Client jupyter code here
    . reset server github link
    . start service with docker

Blog explaining Theory

Check original projects blog post

Functions implemented for Korean NER

  • Simple WebCrawler : gather data from korean wiki pedia for w2v train
  • w2v, fasttext embedding model : provide custom train for model
  • korean char divide func : divide Korean char into smaller pieces for train
  • BI-LSTM CRF for NER : implement it with tensorflow
  • Attention seq2seq for NER : working on it (~ing)

Korean NER Test Result

  • Train Data Example: link
    . you have to preprocess data first before use it
  • Word Level Test
['한화증권']['OG']
['김승우']['PS']
['김수상']['PS']
['6시30분']['DT']
  • Sentence Level Test
[['6시30분']['한화건설']['김승우']['약속']]
[['DT']['OG']['PS']['OO']]

About

Sequence Tagging with Tensorflow


Languages

Language:Python 100.0%