artificial-intelligence beautifulsoup4 college-project github joblib kaggle kaggle-dataset linux machine-learning multinomial-naive-bayes naive-bayes naive-bayes-classifier natural-language-processing popos pycharm python3 scikit-learn simple-project

Multinomial Naive Bayes Language Classification Model

This repository provides a tutorial on implementing language classification using the Multinomial Naive Bayes algorithm. The tutorial includes a Python implementation to detect the language of a given text. The code consists of two main files: main.py for user interaction and detector.py containing the LanguageClassifier class.

Overview

The Multinomial Naive Bayes algorithm is widely used for text classification tasks, including language identification. This tutorial demonstrates how to train a language classifier using a provided dataset and then use the trained model to predict the language of input text.

Prerequisites

Before running the code, ensure you have the following dependencies installed:

Python
Required libraries: requests, bs4, pandas, scikit-learn, joblib

Install the necessary dependencies using the following command:

pip install requests bs4 pandas scikit-learn joblib

Usage

Clone the Repository:

git clone https://github.com/vivekkdagar/NaiveBayesClassifier.git
cd NaiveBayesClassifier

Run the Main Script:
```
python3 main.py
```
Select Data Source and input data:
- Choose the mode ('raw', 'file', or 'website') to input text data.
Results:
- The predicted language for the provided text will be displayed.

Code Structure

main.py: Handles user interaction and data input.
detector.py: Contains the LanguageClassifier class responsible for training and predicting languages.

Data Preprocessing

The LanguageClassifier class preprocesses the training data by removing special characters and transforming the text into a bag-of-words representation using the CountVectorizer from scikit-learn.

Training the Model

The tutorial uses a provided dataset, "Language Detection.csv," to train the Multinomial Naive Bayes model. The model is then serialized using the joblib library for future use.

Additional Notes

To modify or extend the training dataset, edit the "Language Detection.csv" file.
Adjust the HTML tag in the scrape_website function within main.py based on your specific use case.

References

About

Multinomial Naive Bayes Language Classification model

artificial-intelligence beautifulsoup4 college-project github joblib kaggle kaggle-dataset linux machine-learning multinomial-naive-bayes naive-bayes naive-bayes-classifier natural-language-processing popos pycharm python3 scikit-learn simple-project

MIT License

Languages

Language:Python 100.0%