HarshGrandeur / Bert

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bert

This repository is an easy and fast implementation of BERT for NER. The BERT pretrained model has been fine tuned using microsoft implementation of BERT.

BERT is an language model developed by Google which can be fine tuned for downstream tasks like NER, sentiment analysis, question answering etc. It is an unsupervised model trained on a large dataset which can be used in a semi supervised way by training it on a small dataset. It performs extremely well. It uses a bi-directional LSTM.

To summarize BERT most simply :

It performs 2 tasks:

  1. Masking : 15 % of the tokens in the original text are chosed randomly which are masked. Now, masking happens in ways.
  • 80 % of the tokens are masken with [MASK].
  • 10% are not masked.
  • 10 % are masked with random tokens.
  1. Next sentence prediction : It has a binary predictor wich perdicts whether the following sentence is indeeed the next sentence.

To run the code, please follow the below steps.

Clone the microsoft repo first:

git clone https://github.com/microsoft/nlp

Then put the jupyter notebook in the examples/named_entity_recognition folder (You can change the location)

Just change the dataset location if you want to run for your custom dataset. Please follow the format of provided data if you want to use the same code.

About


Languages

Language:Jupyter Notebook 100.0%