Auto-Tagger

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text.

This repo contains:

The Auto-Tagger Web App
The Auto-Tagger Discord bot

A video demo can be found here: https://www.youtube.com/watch?v=3XF4hOLtU1o

Auto-Tagger Repo

Key Features • Installation • Calling the API • Using Flask • Docker image • Data • Training a new model • Contributing

Our Auto-Tagger Web Application

Our Auto-Tagger Discord Bot

Key Features

Usage of Transformer models ( BERT in this case ) and NER ( Named Entity Recognition ) techniques.
Building a training pipeline.
Implementing and training the model ( using Google Colab ).
Building an inference pipeline.
Serving the model using BentoML.
Create a Web Application to visualize our Auto-Tagger features.
Create a Discord bot that implements the Auto-Tagger features.

Installation

All the code required to get started

Clone

Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Setup

In order to install all packages follow the steps below:

Download the model from this drive: https://drive.google.com/file/d/1TyuIoMO42CHHvQVlOpw6Ynco39rQbc6t/view?usp=sharing
Put it in the /results/model.bin ( rename the file as model.bin )
Download the BERT uncased model from here: https://www.kaggle.com/abhishek/bert-base-uncased
Unzip the files in /model/
Run python serving.py inside /src/
Execute the command bentoml serve PyTorchModel:latest

The model will be served on http://127.0.0.1:5000/

Calling the api

To send a request you'd need to send in a POST request:

curl -i --header "Content-Type: application/json" \
        --request POST \
        --data '{"sentence": "John used to play for The Beatles"}' \
        http://127.0.0.1:5000/predict

Example:

#request
{ 
  "sentence": "Jack and James went to the university and they met Emily"
}

The response will be a string of all the names detected separated by a ','. In this example it will be:

#response
"jack,james,emily"

Using Flask

Follow these steps after step 5 in Setup (in /src/ directory):

export FLASK_APP=front.py
export FLASK_DEBUG=1 # For debugging
flask run

Note: Be sure to modify the LOAD_PATH variable in front.py depending on your bentoml latest model location

Creating and running a Docker image and deploying it on Heroku

This sub-section is thoroughly explained in the wiki page of this repository.

Creating and running the discord bot

Documentation is available at the wiki page of this repository.

Data

We used an Annotated Corpus for Named Entity Recognition dataset, that we found on kaggle: https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

This is the extract from GMB corpus which is tagged, annotated and built specifically to train the classifier to predict named entities such as name, location, etc.

This dataset contains 47958 sentences with 948241 words.

Training a new model

You can train your own model by using the train.py script. Change the config.py file with the parameters you want and then execute the following command:

python train.py

This will generate your model file in config.MODEL_PATH as model.bin.

Contributing

To get started...

Step 1

Option 1
- 🍴 Fork this repo!
Option 2
- 👯 Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Step 2

HACK AWAY! 🔨🔨🔨

Step 3

🔃 Create a new pull request using https://github.com/MLH-Fellowship/Auto-Tagger/compare/.

License

This project is licensed under the Apache License, Version 2.0.

MLH-Fellowship / Auto-Tagger

Auto-Tagger

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text.

This repo contains:

Auto-Tagger Repo

Our Auto-Tagger Web Application

Our Auto-Tagger Discord Bot

Key Features

Installation

Clone

Setup

Calling the api

Using Flask

Creating and running a Docker image and deploying it on Heroku

Creating and running the discord bot

Data

Training a new model

Contributing

Step 1

Step 2

Step 3

License

About

Languages

Auto-Tagger

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text. This repo contains:

Auto-Tagger Repo

Our Auto-Tagger Web Application

Our Auto-Tagger Discord Bot

Key Features

Installation

Clone

Setup

Calling the api

Using Flask

Creating and running a Docker image and deploying it on Heroku

Creating and running the discord bot

Data

Training a new model

Contributing

Step 1

Step 2

Step 3

License

About

Languages

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text.

This repo contains: