Fennecnightingale / Exoplanet-Prediction

Using Logistic Regression to predict whether or not a given star will have an Exoplanet in orbit, using data from HYG3 and the open exoplanet archive.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predicting Exoplanets With Logistic Regression


MW
Course: Part-Time Data Science

Instructor: Amber Yandow

- Author: Fennec C. Nightingale

Overview:

Predict & map whether or not a star is likely to have an exoplanet & make recommendations for astronomers to speed up the rate of exoplanet discovery Focus on recall so we're unlikely to inccoretly miss any planets.

The Data:

CSV & XML tree Data on stars, their planets, their parent systems & their physical characteristics.

  • Our expolanet data is from the Open Exoplanet Archive/Catalog & includes: Star names, magnitudes, radii, distance, right asciension/declination, spectral class

  • Our additional star data is from the HYG dataset & includes: Id numbers, names, magnitudes, luminosity, x, y, z coordinates for the stars, spectral class, and some details about each stars orbit

The Process:

I used Python in Jupyter Notebook to perform OSEMN & Logistic regression to create our model and predictions for housing prices in King County.

O - Obtain

We obtained our stars data here from the links above over at Kaggle. If you want to get started on your own classification project like this, fork this repo.

S- Scrub

After importing all of our data we checked it for null values, outliers, duplicates, and any other errors there might be in our dataset. We checked each column and decided what data we needed to keep or discard, what we might need to fill, or any other alterations we could make to fix up our data before we start modeling. This turned out to get rid of too much of our initial dataset on exoplanets alone, so I also lined up the ID numbers with the HYG dataset so I could randomly sample stars we have not found planets around.

E - Explore

Hist_Matrix
We check out our data to see how our values are distributed, if there is any strong correlation, or if theres anything we missed in our scrubbing. Some of the catagories we wanted to include had really high correlations, but our cut off was .6 & there was no way to fix the multicolinearity through strategies like multiplication, so those catagories were dropped.

M - Model

We use the Sklearn Logistic regression module to get our best fit in this project.
To work with some of our data in this model, we also have to get dummies for our catagorical variables. After doing an initial model including all of our variables we used a GricSearchCV to go back through and refine our model, trying to make our predictions stronger. After modeling, we check all available evaluation metrics & compare.
ROC

N - iNterpret

Here we take a deep dive into figuring out what our evaluation metrics are saying about our models & plot how our best features compare.
violin

Observations

  • We were able to make predictions as to wether or not a star would have n exoplanet, based on basic information about the stars themselves, with a high degree of recall, precision & accuracy.

    training

    testing

  • Currently our biggest predictors are things that affect how well we see stars, like their absolute magnutde, luminosity index & distance

    poscoef

    poscoef

Future Work

-Use kepler labelled time series data to train deep learning algorithms to detect exoplanets based on light fluxuations in observed stars.

-Write something that is able to parse and accurately separate stellar types (as well as predict missing values) to test predictions made against more random data.

-Use additional data from the Open Exoplanet Catalogue to predict features of planets around stars & predicted stars.

-When more data is available, expand predictor to include multi-planetary predictions.

For More Informarion

See the full analysis in the Jupyter Notebooks or review our Presentation. For additional info, contact me here: Fennec C. Nightingale,

Repository Strucure

├──.ipynb_checkpoints
├──.virtual_documents
├──.__pycache__
├──Scrubbed.csv
├──Images
    ├── hist.png
    ├── MilkyWay.png
    ├── outerarmmid.png
    ├── outerarmmiin.png
    ├── outerarmout.png
    ├── outerarmouter.png
    ├── planetviolin.png
    ├── poscoef.png
    ├── negcoef.png
    ├── ROC.png
├── PDF
    ├──Obtain & Scrub.pdf
    ├──Modeling.pdf
    ├──Presentation.pdf
├── Obtain & Scrub.ipynb
└── Exoplanet Regression.ipynb
 

About

Using Logistic Regression to predict whether or not a given star will have an Exoplanet in orbit, using data from HYG3 and the open exoplanet archive.


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%