Ansu-John / ML-Classification

Build and evaluate classification model using PySpark 3.0.1 library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning using Classification algorithms

OBJECTIVE

The code shared demonstrates the various Classification algorithms using Python.

DATASET USED

Please find the data used, uploaded to github along with the code.

TOOLS

Python, Spark MLlib

TECHNIQUES

Classification

Logistic regression

Logistic regression is a popular method to predict a categorical response. It is a special case of Generalized Linear models that predicts the probability of the outcomes.

In spark.ml logistic regression can be used to predict a binary outcome by using binomial logistic regression, or it can be used to predict a multiclass outcome by using multinomial logistic regression.

Decision tree classifier

Decision trees are a popular family of classification and regression methods.

Random forest classifier

Random forests are a popular family of classification and regression methods.

Gradient-boosted tree classifier

Gradient-boosted trees (GBTs) are a popular classification and regression method using ensembles of decision trees.

Multilayer perceptron classifier

Multilayer perceptron classifier (MLPC) is a classifier based on the feedforward artificial neural network. MLPC consists of multiple layers of nodes.

Linear Support Vector Machine

A support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks.

Naive Bayes

Naive Bayes classifiers are a family of simple probabilistic, multiclass classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between every pair of features.