girishp92 / Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm

This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm

This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show how the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.

Datasets used: 1. Wine Quality http://archive.ics.uci.edu/ml/datasets/Wine+Quality 2. Student Performance http://archive.ics.uci.edu/ml/datasets/Student+Performance 3. Adult Dataset https://archive.ics.uci.edu/ml/datasets/Adult 4. http://archive.ics.uci.edu/ml/datasets/forest+fires

We also used the Gaussian mixture model GMM Sampling algorithm to create sampling data of various dataset mentioned above and use on the model implemented and test its results.

About

This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.


Languages

Language:Python 100.0%