monologg / naver-nlp-challenge-2018

NER task for Naver NLP Challenge 2018 (3rd Place)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NER Task for Naver NLP Challenge 2018

3rd place on Naver NLP Challenge NER Task

  • The code uses BiLSTM + CRF, with multi-head attention and separable convolution.
  • We used fastText for word and character pretrained embedding.
  • Baseline code and dataset was given from Naver NLP Challenge Github.

Model

1. Model Overview

2. Input Layer

Data

Pretrained Embedding

  • We use 300-dim Korean fastText. This embedding is basically based on words(어절), but most of the characters(음절) can be covered by fastText, so we also used fastText for character embedding.
  • Take out the words and characters that are only in train data sentences and make it into to binary file with pickle library.

Requirements

1. Download pretrained embedding

  • For installing word pretrained embedding (400MB) and char pretrained embedding (5MB)
  1. Download from this Google Drive Link.
  2. Make 'word2vec' directory from root directory.
  3. Put those two file in the 'word2vec' directory.
$ mkdir word2vec
$ mv word_emb_dim_300.pkl word2vec
$ mv char_emb_dim_300.pkl word2vec

2. pip

  • tensorflow (tested on 1.4.1 and 1.11.0)
  • numpy

Run

$ python3 main.py

Other

Contributors

About

NER task for Naver NLP Challenge 2018 (3rd Place)


Languages

Language:Python 100.0%