Tech-with-Vidhya / NLP_Multi-Class_Text_Classification_using_BERT_Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP_Multi-Class_Text_Classification_using_BERT_Model

Introduction:

This project covers the end to end implementation of a multi-class text classification NLP solution using Bidirectional Encoder Representations from Transformers (BERT) Algorithm for the AG's News Corpus Data.

The project aims at building, training and fine-tuning the BERT model with respect toclassification on the AG News dataset.

We will witness how the state-of-the-art Transformer BERT model can achieve extremely high-performance metrics for a large corpus of data comprising more than 100k+ labelled training examples.

Dataset:

we will be using the datasets from the hugging face library.

The BERT model will be built on the AG News dataset. a. AG News (AG’s News Corpus) is a sub dataset of AG's corpus of news articles constructed by assembling titles and description fields of articles from the 4 largest classes b. The four classes are: World, Sports, Business, Sci/Tech c. The AG News contains 30,000 training and 1,900 test samples per class.

Project Implementation Steps:

  1. Data Exploration and Analysis
  2. Data Pre-processing
  3. Creation of the BERT Model
  4. Compiling the BERT Model
  5. Model Training with Defined Hyperparameters
  6. Model Evaluation and Validation
  7. Model Performance Metrics Measures
  8. Saving the Finalized Model

Tools & Technologies:

Python, numpy, pandas, timeit, ktrain, transformers, tensorflow

About


Languages

Language:Jupyter Notebook 72.0%Language:Python 28.0%