Vivekkaspa / OIBSIP

This repository is a compilation of all the tasks done by me in the field of Data science during the tenure of July-August 2023 as an intern at Oasis InfoByte.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OIBSIP

This repository is a compilation of all the tasks done by me in the field of Data science during the tenure of July-August 2023 as an intern at Oasis InfoByte.

Iris Flower Classification : I tackled the famous Iris dataset, using machine learning techniques to classify iris flowers into species. With Logistic Regression and various other algorithms like Decision trees, Support Vector Machine (SVM) and Mulyi-layer Perceptron(MLP) classifier, I was able to build a classifier model with 100% accuracy.

Unemployment Analysis with Python: I analyzed unemployment rates, employment numbers, and labor participation rates in India. By using Python, I gained a deeper understanding of regional variations and factors affecting unemployment.This task mainly focuses the data analysis of unemployment rates across different states in India. To know about relationships between various features in the dataset, a correlation matrix is plotted . Various other data visualization techniques like plotting barplots, pie-charts & graphs are employed to understand the current unemployment trends , unemployment distribution across different demographic segments and to figure out the possible driving factors behind this.

Car Price Prediction: I implemented the Car Price Prediction model with python using the Linear regression with the R2_score of 81.81% and using Decision trees with the highest R2_ score of the regressor model being 89.31% (indicating a good fit to data) .This Car Price Prediction model is trained using the various classic machine learning algorithms such as Linear Regression and Decision trees , that can predict the price of a car.

Email Spam Detection:I implemented the Email Spam Detector- cum-Classifier using Logistic Regression with the accuracy of the training model being 99.73% .We've been the recipient of spam mails most commonly now-a-days. This Email Spam Detector-cum-Classifier model is trained using the Logistic regression algorithm that can recognize and classify emails into spam and non-spam(ham).

Sales Prediction with Linear Regression: Using linear regression, I predicted sales based on advertising expenses across different media channels. Exploring the ‘Advertising.csv’ dataset, I discovered valuable insights into the most effective advertising medium.

Business Logic & key methodology involved:

📝 Importing the libraries/ dependencies

📝 Importing the data set & Data Exploration , Exploratory Data analysis

📝 Data preparation & Pre-processing

📝 Data Visualization & Plotting

📝 Building the classifier model by splitting the dataset, Testing & training

📝 Model Selection & Training

📝 Model Evaluation

Technology Stack:

♦️ Programming language: Python

♦️ Data manipulation & Analysis: • Numpy
• Pandas

♦️ Machine learning libraries: • Scikit-learn(sklearn)

♦️ Data Visualization libraries : • Matplotlib • Seaborn

♦️ Development Environment: • Jupyter Notebook

About

This repository is a compilation of all the tasks done by me in the field of Data science during the tenure of July-August 2023 as an intern at Oasis InfoByte.


Languages

Language:Jupyter Notebook 100.0%