tapasya234 / InsurancePolicyPredictor

The aim of the project is to generate models on the provided dataset using the various classifications techniques present in the R repository, analyze the technique and results of the models, and present one which is the most accurate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InsurancePolicyPredictor

The final project for the course "R programming for Data Scientists" involved applying various Machine Learning Classifier models to the "Insurance Company Benchmark (COIL 2000) Data Set" from the UCI Machine Learning Repository. The dataset contains 86 attributes, and 5000 entries in the training dataset and 85 attributes and 4000 entries in the test dataset. This whole project is done using R-programming language using the R-Studio IDE.

The aim of the project is to generate models on the provided dataset using the various classifications techniques present in the R repository, analyze the technique and results of the models, and present one which is the most accurate. The following classification techniques are used in the project: Logistic Regression, Naive Bayes, Decision Tree, SVM, Neural Network, LDA and Random Forest. Some of the packages we utilized for the project are e1071, rpart, mass, neuralnet, randomForest, and boruta. After making sure that the models are not over-fitted, we declared that the logistic regression model and the decision tree model were providing the best accuracies for this dataset, but we wish we spent more time on SVM.

Technologies: R-programming, Machine Learning Algorithms

Date: April - May 2017

About

The aim of the project is to generate models on the provided dataset using the various classifications techniques present in the R repository, analyze the technique and results of the models, and present one which is the most accurate.