ryanschaub

Ryan Schaub's repositories

Mobile-Games-A-B-Testing-with-Cookie-Cats

In this project, we will analyze the result of an AB-test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention.

Language:Jupyter Notebook19 30

Predicting-Loan-Interest-Rates

In this project we will be using the publicly available and Kaggle-popular LendingClub data set to train Linear Regression and Extreme Gradient Descent Boosted Decision Tree models to predict interest rates assigned to loans. First, we will clean and prepare the data. This includes feature removal, feature engineering, and string processing.There are several entries where values have been deleted to simulate dirty data. Then, we will build machine learning models in Python to predict the interest rates assigned to loans. We will evaluate our models' performances using the root mean squared error (RMSE) metric and compare our models' results.

Language:Jupyter Notebook7 20

Sentiment-Analysis-on-IMDB-Film-Reviews

Sentiment Analysis is a popular Natural Language Processing (NLP) task which allows us to extract the overall opinion in a text. In this project, we will be performing Sentiment Analysis on some IMDB movie reviews, to classify the overall review as positive or negative. When dealing with text data, a prevalent issue is how to encode the words as a numeric feature that can be used to compute the output of a classification algorithm. Especially because words don’t naturally lend themselves to a numeric ordering, there have been many approaches on how to featurize a text. In this project, we will use the bag of words model, which uses the count of words in a text as a feature. We will begin by using logistic regression to perform this task, followed by a decision tree approach, and random forests models. We will tune the regularize and tune the parameters of each model and use AdaBoost Classifiers with our Decision Tree and Random Forest models as our base estimator. Finally, we will compare the performance of each model on our training and validations data sets.

Language:Jupyter Notebook5 1 1

Level-Difficulty-in-Candy-Crush-Saga

In this project we will work with a real Candy Crush data set and use this data to estimate level difficulty.

Language:Jupyter Notebook4 20

The-Hottest-Topics-in-Machine-Learning-NLP-on-NIPS-Research-Papers-

Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world where groundbreaking work is published. In this Project, we will analyze a large collection of NIPS research papers from the past decade to discover the latest trends in machine learning.

Language:Jupyter Notebook4 40

US-Census-Demographic-Data

Mining, Cleaning, Organizing, Visualizing, and Regressing on Kaggle provided US Census Demographic Data: Demographic and Economic Data for Tracts and Counties.

Language:Jupyter Notebook3 10

Breast-Cancer-Classification-using-Support-Vector-Machine-Models

Exploring the Wisconsin Breast Cancer data set (which was never actually intended for machine learning) and optimizing different Support Vector Machine models to classify benign and malignant tumors.

Language:Jupyter Notebook2 10

The-U.S.-Mexican-Border-Wall-and-Staffing-A-Statistical-Approach-

Independent exploration project on U.S. Mexican Border wall, apprehensions, and staffing data publicly released by U.S. Border Patrol and Security. Visualization, statistical analysis, and linear regression are carried out to tell a story.

Language:Jupyter Notebook100

Website-Traffic-and-Customer-Conversion-Analysis

One of the ways in which companies measure the success of their marketing efforts is the percentage of people who progress from visiting their web site to accepting their offers for products and services. We refer to this percentage as the "conversion rate". One day, at a marketing team meeting, someone says: "Our conversion rate is down". Here we investigate whether this claim is true. We take a look at the conversion data and try to understand what is happening to the conversion rate and determine if the conversion rate is down and, if so, why it is down.

Language:Jupyter Notebook100

MNIST-Digit-Classification-k-nearest-neighbors-

Classification of MNIST data set using base R (and ggplot). Cross validation optimization of k parameter was carried out and no external libraries were used as the goal of this project was to code everything from scratch.

020

-EXPLORING-THE-KAGGLE-DATA-SCIENCE-SURVEY

In this project, we are going to find out what tools, algorithms, and languages professionals use in their day-to-day work. Our data comes from the Kaggle Data Science Survey which includes responses from over 10,000 people that write code to analyze data in their daily work.

Language:Jupyter Notebook000

A-New-Era-for-Data-Analysis-in-Baseball-MLB-Statcast-

"There's a new era of data analysis in baseball. Using a new technology called Statcast, Major League Baseball is now collecting the precise location and movements of its baseballs and players. In this project, you will use Statcast data to compare the home runs of two of baseball's brightest (and largest) stars, Aaron Judge (6'7") and Giancarlo Stanton (6'6"), both of whom now play for the New York Yankees." -David Venture, Instructor at DataCamp

Language:Jupyter Notebook000

Census-Case-Study-with-Python-SQLite-SQLAlchemy-

In this project we connect to and populate a database with a U.S. Census data set (csv file) and query the database to find out interesting information about the U.S. population in 2000 and 2008. The tools we used for this projects are Python, SQLite, and SQLAlchemy to query the database.

Language:Jupyter Notebook000

ryanschaub

Ryan Schaub's repositories

Mobile-Games-A-B-Testing-with-Cookie-Cats

Predicting-Loan-Interest-Rates

Sentiment-Analysis-on-IMDB-Film-Reviews

Level-Difficulty-in-Candy-Crush-Saga

The-Hottest-Topics-in-Machine-Learning-NLP-on-NIPS-Research-Papers-

US-Census-Demographic-Data

Breast-Cancer-Classification-using-Support-Vector-Machine-Models

The-U.S.-Mexican-Border-Wall-and-Staffing-A-Statistical-Approach-

Website-Traffic-and-Customer-Conversion-Analysis

MNIST-Digit-Classification-k-nearest-neighbors-

-EXPLORING-THE-KAGGLE-DATA-SCIENCE-SURVEY

A-New-Era-for-Data-Analysis-in-Baseball-MLB-Statcast-

Census-Case-Study-with-Python-SQLite-SQLAlchemy-

Dr.-Semmelweis-and-the-Discovery-of-Handwashing

Hangman-in-Python

julia