ermiasgelaye / machine-learning-challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning- Exoplanet Exploration

exoplanets.jpg

Background

Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system. To help process this data,a machine learning models were created to classifying candidate exoplanets from the raw dataset.

  1. Preprocess the raw data
  2. Tune the models
  3. Reporting

Preprocess the Data

  • Preprocessing was conducted to the dataset prior to fitting the models.
  • Feature selection and remove unnecessary features was conducted fro all models.
  • I used MinMaxScaler to scale the numerical data.
  • I Separate the data into training and testing data.

Tune Model Parameters

  • I used GridSearch to tune model parameters.
  • I tuned and compare the reported classifiers.

Reporting

In this project I used five machine learning models in order to tarin, test and classifying candidate exoplanets from the raw dataset. In the reporting section summary about the findings, assumptions and comparison of models is executed.

Logistic Regression

BeforeCV AfterCV
Training Score 0.749 0.872
Testing Score 0.757 0.864

Logistic Regression Classification Report

Random Forest

BeforeCV AfterCV
Training Score 1.0 1.0
Testing Score 0.897 0.899

Random_Forest Classification Report

Support Vector Machine(SVM)

BeforeCV AfterCV
Training Score 0.845 0.886
Testing Score 0.841 0.879

Support Vector Machine Classification Report

K-Nearest Neighbors

BeforeCV AfterCV
Training Score 0.675 1.0
Testing Score 0.636 0.842

K-Nearest Neighbors Classification Report

Neural Networks and Deep_Learning

  • Normal Neural Network - Loss: 0.2826294135174435, Accuracy: 0.8787185549736023
  • Deep Neural Network - Loss: 0.2919023224500006, Accuracy: 0.8655606508255005

Comaprison Summary

The logistic regression training and test score significantly increases BeforeCV and AfterCV but comparing the other model's the value was lower. The f1-score of FALSE POSITIVE for the logistic regression model is 0.89 meaning, it can predict FALSE POSITIVE well, and it's reliable, but comparing random forest (0.98) and K-Nearest Neighbors(0.98) it is lower. Random Forest model's best score of (0.89) seems better than the SVM model (0.87) when comparing the scores. The Normal Neural Network accuracy(0.87) is better than Deep Neural Network(0.86).

In general, from the executed machine learning models on the given exoplanets dataset, I found that the random forest model is better to predict the data. It was a good experience to know which machine learning model does what and comparing the training, testing, accuracy, recall, precision results of the models.


Resources


© 2019 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.

About


Languages

Language:Jupyter Notebook 100.0%