nishantbundela / ML_Projects_and_Docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning with Python

===========================================================================
This repository contains Machine Learning projects. The projects here are well explained and provide descriptive explanation of how everything works.
Repo Link

===========================================================================

Libraries Used

The following libraries were used to implement the projects.

  • NumPy (for linear algebra)
  • Pandas (for data preprocessing)
  • Scikit-learn (for machine-learning)
  • Matplotlib (for data visualization)
  • Seaborn (for statistical data visualization)
  • SciPy (for scientific computing)
  • Statsmodels (statistical computation)

===========================================================================

The projects description are given in the readme document. The projects are divided into various categories listed below:-

Contents

  • Supervised Learning : Regression Projects

    • Simple Linear Regression Project: A Simple Linear Regression model to model the linear relationship between Sales and Advertising dataset for a dietary weight control product.

    • Multiple Linear Regression Project: In this project, I build a Multiple Linear Regression model to estimate the relative CPU performance of computer hardware dataset. I discuss the linear regression assumptions and various tools to estimate the model performance.

===========================================================================

  • Supervised Learning : Classification Projects

    • Logistic Regression Project: In this project, I train a binary Logistic Regression classifier to predict whether or not it will rain tomorrow in Australia. I have used Rain in Australia dataset from the Kaggle website. I have demonstrated feature engineering techniques alongwith Recursive Feature Elimination with Cross-validation, k-fold Cross Validation and GridSearch CV in this project.

    • Support Vector Machines Project: In this project, I build a Support Vector Machines classifier to classify a Pulsar star. I have used the Predicting a Pulsar Star dataset from the Kaggle website. I have discussed the kernel trick in this project. I have used Stratified Cross-Validation technique alongwith GridSearch CV in this project.

    • k Nearest Neighbours Project: k Nearest Neighbours is the simplest of all machine learning algorithms. In this project, I build a kNN classifier to classify the patients suffering from Breast Cancer. I have used the Breast Cancer Wisconsin (Original) Data Set from the UCI Machine Learning Repository.

    • Naive Bayes Classification Project: In this project, I build a Naïve Bayes Classifier to classify a person's salary. I implement Naive Bayes Classification with Python and Scikit-Learn to predict whether a person makes over 50K a year. I have used Adult Data Set from the UCI Machine Learning Repository website.

    • Decision Tree Classification Project: Classification and Regression Trees or CART are very popular machine learning algorithms. In this project, I build two Decision Tree Classifier models - with criterion gini and entropy to predict the safety of the car. I have used the Car Evaluation Data Set from the UCI Machine Learning Repository website.

    • Random Forest Classification Project: In this project, I build two Random Forest Classifier models (with 10 and 100 decision-trees) to predict safety of the car. The accuracy increases with number of decision-trees. I have also demonstrated the feature selection process using the Random Forest model. I have used the Car Evaluation Data Set from the UCI Machine Learning Repository website.

    • XGBoost Classification Project: XGBoost is an acronym for Extreme Gradient Boosting. In this project, I implement XGBoost with Python and Scikit-Learn to classify the customers from two different channels as Horeca (Hotel/Retail/Café) customers or Retail channel (nominal) customers. I have used Wholesale customers data set from UCI Machine learning repository.

===========================================================================

  • Unsupervised Learning Projects

    • K Means Clustering Project: K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences. In this project, I implement K-Means clustering with Python and Scikit-Learn. I have used Facebook Live Sellers in Thailand dataset for this project from the UCI machine learning repository.

===========================================================================

  • Recommender Systems Project

    • Recommender Systems with Python: Recommender Systems are one of the most popular and widely used application of data science. In this project, I build a Recommender System with Python. I discuss various types of recommender systems including Content-based and Collaborative filtering recommender systems. Also, I discuss matrix factorization and how to evaluate recommender systems.

===========================================================================

  • Statistical Analysis Projects

    • Descriptive Statistics Project: Descriptive Statistics is the subject matter of this project. It gives us the basic summary measures about the dataset. The summary measures include measures of central tendency (mean, median and mode) and measures of variability (variance, standard deviation, minimum/maximum values, IQR (Interquartile Range), skewness and kurtosis).

    • Inferential Statistics Project: Inferential Statistics is the process of drawing inferences about the population from the sample data. In this project, I have discussed various inferential statistical concepts and their practical applications. I have discussed Central Limit Theorem, t-test, ANOVA , Chi-square goodness of fit test and Correlation analysis.

    • Hypothesis Testing Project: Hypothesis testing is a statistical tool to test an assumption regarding the population parameter. This project is dedicated towards hypothesis testing. In this project, I have discussed, hypothesis testing, p-value, significance level, types of errors in hypothesis testing and one-tailed and two-tailed tests.

===========================================================================

  • Data Analysis Projects

    • Exploratory Data Analysis with Python: This project is all about Exploratory Data Analysis. In this project, I explore the Absenteeism at work dataset. I discuss univariate and multivariate useful techniques to explore this dataset.

    • Data Analysis with Pandas: Pandas is an open source library for data analysis in Python. In this project, I explore Pandas and important data analysis tools of pandas. I have used the BlackFriday dataset downloaded from Kaggle website.

    • Data Analysis with NumPy: NumPy is the fundamental library of Python which is required for scientific computing. In this project, I explore NumPy and various data analysis tools of NumPy.

    • Time Series Analysis with Python: A time series is a series of data points recorded at different time intervals. The time series analysis means analyzing the time series. In this project, I implement a Seasonal ARIMA time series model in Python to predict Occupancy rates of car parks in Parking Birmingham Data Set.

===========================================================================

  • Data Visualization Projects

    • Data Visualization with Matplotlib: Matplotlib is the basic data visualization library of Python. In this project, I describe Matplotlib, its object hierarchy, its interfaces, different plot types with Matplotlib and various customization techniques with Matplotlib.

    • Data Visualization with Seaborn: Seaborn is a Python data visualization library based on Matplotlib. In this project, I explore Seaborn. I discuss Seaborn API overview, its functionality, setting Seaborn aesthetic parameters and colour palette. I discuss different distributions, various plot types and multi-plot grids with seaborn.

About


Languages

Language:Jupyter Notebook 100.0%