NLP_Multi-Class_Text_Classification_using_BERT_Model
Introduction:
This project covers the end to end implementation of a multi-class text classification NLP solution using Bidirectional Encoder Representations from Transformers (BERT) Algorithm for the AG's News Corpus Data.
The project aims at building, training and fine-tuning the BERT model with respect toclassification on the AG News dataset.
We will witness how the state-of-the-art Transformer BERT model can achieve extremely high-performance metrics for a large corpus of data comprising more than 100k+ labelled training examples.
Dataset:
we will be using the datasets from the hugging face library.
The BERT model will be built on the AG News dataset. a. AG News (AG’s News Corpus) is a sub dataset of AG's corpus of news articles constructed by assembling titles and description fields of articles from the 4 largest classes b. The four classes are: World, Sports, Business, Sci/Tech c. The AG News contains 30,000 training and 1,900 test samples per class.
Project Implementation Steps:
- Data Exploration and Analysis
- Data Pre-processing
- Creation of the BERT Model
- Compiling the BERT Model
- Model Training with Defined Hyperparameters
- Model Evaluation and Validation
- Model Performance Metrics Measures
- Saving the Finalized Model
Tools & Technologies:
Python, numpy, pandas, timeit, ktrain, transformers, tensorflow