Problem Statement

The Drug Classifcation Analysis is used to analyse the effect of a particular drug based on certain paramrters (Age,Sex,BP,Cholesterol,Na_to_K) and finding an effective model which holds a strong relation with the parameters to predict the specific drug consumption index.

Dataset

The dataset used is the Drug Classification With Different Algorithms from Kaggle.

The 6 class labels are:

Age :Age of the person (int64).
Sex :Gender the person holds(object or categorical) (Male or Female).
Cholesterol :Fat level of the person (object or categorical) (High or Low or Normal).
Na_to_K :Sodium or Potassium content of the body (float64).
BP : Blood Pressure of the person (object or categorical) (High or Normal).

Target Variable:

Drug (object or categorical)

Drug refer to the type of drug consumed (through medication or direct injection)

Type:

A,B,C,X,Y

Model(s) Used

KNN Classifier

In this kernel, parameters of KNN Algorithm are described and effects of these paremeters on result are observed. First prediction is predicted with default parameters and this result is used for comparing. After that, best value of every parameters are found and are discussed their effects on result.Finally, GridSearch algorithm is used to find best values of each parameters. So results can be compared each other in the conclusion part.

i) Calculate distance

ii) Find closest neighbors

iii)Vote for labels

Refer

Random Forest

The Random forest or Random Decision Forest is a supervised Machine learning algorithm used for classification, regression, and other tasks using decision trees. The Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It is basically a set of decision trees (DT) from a randomly selected subset of the training set and then It collects the votes from different decision trees to decide the final prediction.

Based on the MSE the entropy of the system is reduced to get the best classification.

Refer

SVM Classifier

Support Vector Machines

Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. It can easily handle multiple continuous and categorical variables. SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

i) Generate hyperplanes which segregates the classes in the best way. Left-hand side figure showing three hyperplanes black, blue and orange. Here, the blue and orange have higher classification error, but the black is separating the two classes correctly.

ii) Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.

Refer

Future Work

Need to bring some improvemrnt in the data cleaning methods through standardised scaling non object variables.
Merging more classes for analysis (eg medication consumption rate, other mineral components comsumed etc).
Check for multicollinearity between parameters for significance.

Tashmoy966 / Task1_Classification_Model_Drug_Classification

Problem Statement

Dataset

Model(s) Used

Future Work

About

Languages