cici7941 / Independent_Research

Welcome to my independent research repository!

Home Page:http://jonathanjohann.github.io/Independent_Research/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jonathan's Independent Research

Introduction

Since the end of highschool, I've appreciated the practice of autodidacticism. Thus, I have finally decided to start my own repository of independent research projects to practice both implementation and theory. Hopefully, if you've stumbled upon my repository by accident or have come from LinkedIn, you find this page to be insightful in some way!

Coming Soon!

  • Finishing up Spanish AB
  • Finishing up 6.S085 Statistics for Research Projects Problem Set 2
  • Just realized you can download Quantopian Research Jupyter Notebooks! Will start playing around with some stuff there and hopefully have some interesting content to show.
  • Quandl looks fun!

15.0621 Data Mining: Finding the Data and Models that Create Value: Fall 2016

Reference Website: https://sloanbid.mit.edu/resources/15.062.pdf

For Fall Semester 2016 at MIT, I decided to take Data Mining with Roy Welsch. While the course normally calls for using JMP PRO, I was provided the option of using Python. As a result, I've done most of my analysis in Python as an applied data science exercise.

15.0621 Data Mining Assignment 1.ipynb

Jupyter Notebook: https://git.io/vXXev

This assignment involved basic visualization and exploratory data analysis (very basic, it was the first assignment in the class) of car sales and laptop sales data.

15.0621 Data Mining Assignment 2.ipynb

Jupyter Notebook: https://git.io/vXM5V

This assignment involves looking at the Boston Housing dataset.

  • Multivariate Linear Regression, kNN
  • Normalization of features via MinMaxScaler, StandardScaler
  • Feature selection using SelectKBest
  • Usage of Pipeline
  • Parameter tuning with GridSearchCV

6.S085 Statistics for Research Projects: IAP 2015

Reference Website: http://www.mit.edu/~6.s085/

Although 6.S085 does not appear as if it will be offered this IAP for the winter of 2017, I feel strongly that these slides are very important for a researcher to know. As a result, I've decided to replicate many of the concepts in Jupyter Notebooks.

6.S085 Statistics for Research Projects Problem Set 1.ipynb

Jupyter Notebook: https://git.io/vXXeJ

  • Data Visualization via boxplots, histograms, kde/density plots.
  • Use of Kolmogorov-Smirnov Statistic.

6.S085 Statistics for Research Projects Problem Set 2.ipynb (in progress)

Jupyter Notebook: https://git.io/vXXen

A Collection of Data Science Take Home Challenges

Reference Website: http://datascientistjobinterview.com/

I personally love this guide to data science take home challenges. Often times, as aspiring data scientists, we're told to look at Kaggle and, as much I like Kaggle, I like how these Data Science challenges are short activities where you can wrap up your analysis faster. If you're a starting data scientist without a knack for sklearn and pandas, I'd definitely recommend starting with this!

Spanish AB Test.ipynb (in progress)

Jupyter Notebook: https://git.io/vXXe0

  • Determining Feature Importances using a Decision Tree Classifier via sklearn.

Kaggle

Reference Website: https://www.kaggle.com/

After getting my hands dirty with reading "An Introduction to Statistical Learning with Applications in R" (Slowly making my way through "Elements of Statistical Learning" when I have the time) and practicing with "A Collection of Data Science Take Home Challenges", this is a great next step and an incredibly well known site by data scientists. During periods when I have more time to take on some of the interesting projects on Kaggle, I'll post the stuff that I work on!

Details for contacting me:

LinkedIn Profile: https://www.linkedin.com/in/jonathan-f-johannemann-5aa85894

If you have any questions or if you find that something is in fact incorrect, please feel free to reach out to me at jonjoh@mit.edu!

About

Welcome to my independent research repository!

http://jonathanjohann.github.io/Independent_Research/


Languages

Language:Jupyter Notebook 99.1%Language:CSS 0.4%Language:HTML 0.4%