LeondraJames

Leondra R. Gonzalez's repositories

International-Debt-Stats-EDA

Used SQL in Jupyter Notebooks to analyze and explore data on international debts and codes.

Language:Jupyter Notebook3 10

AdClick_Fraud

Capstone project #2 for the Harvard University Professional Certificate in Data Science

Language:R2 10

Customer-Churn-w-Logistic-Regression

Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.

Language:Jupyter Notebook2 20

Disney-Movies-Box-Office-Hits

Analysis of Disney's top grossing films (adjusted for inflation) in Python, using regression to attribute film genre to success. The project includes using regression on the data, as well as bootstrap regression to determine confidence intervals of the intercept and coefficients.

Language:Jupyter Notebook2 20

TheMatrixScript_NLP

A project utilizing NLP techniques and analysis including text mining, document term matrices, sentiment analysis, wordclouds and topic modeling with LDA.

Language:HTML2 10

AWSSageMaker_PythonXGBoostTutorial

Python XGBoost model, using Amazon SageMaker, EC2 instances and S3 buckets. Used to prepare, partition, train, tune, predict and evaluate model. Project involves predicting customers who sign up for a financial product at a bank.

Language:Jupyter Notebook1 20

Boston-Housing---Random-Forest-XGBoost

Leveraging regression random forest and XGBoost algorithms with cross validation and grid search to tune the best performing model on the Boston Housing dataset. Analyzed and visualized the most statistically significant features for both models. Achieved an RMSE of $2K

Language:Jupyter Notebook1 20

Degrees-That-Pay-You-Back

A cluster analysis leveraging the kmeans algorithm to determine which degrees are likely to yield which levels of income based on historical data.

Language:Jupyter Notebook1 10

Film-Similarity-NLP-with-KMeans-Hierarchical-Clustering

Used NLP techniques (tokenization, stemming, vectorization for TF-IDF) and clustering algorithms (Kmeans and Hierarchical clustering) to mine the "similarities" between films based on their plots provided by IMBD and Wikipedia. The dataset contains the titles of the top 100 movies on IMDb.

Language:Python1 20

HarvardXCapstone---Film-Recommender-System

Capstone Submission #1 for the Harvard University Professional Certificate in Data Science.

Language:R1 10

Hyundai-Cruise-Ship-Crew-Prediction

Predicting the number of required crew needed for manning a Hyundai Cruise ship based on information like number of cabins and passengers using linear regression. Leveraged SQL and PySpark,

Language:Jupyter Notebook1 20

MarketBasketAnalysis-MBA-

Use of associative rule mining using the APRIORI algorithm

Language:R1 10

MarkovChains_MultiTouchAttribution

Multi touch attribution models, including Markov chains

Language:R1 20

MobileGameABTest

2 A/B tests, testing the difference in 1) average player 1 day and 2) 7 day retention against control (old player level) and new version (new player level)

1 10

Netflix-Content-Duration-Analysis

Given the large number of movies and series available on Netflix, it is a perfect opportunity to dive into the entertainment industry with an analysis of Netflix content durations. This analysis aims to understand trends in content duration on the Netflix platform since 2011 through 2020.

Language:Jupyter Notebook1 20

Private_Public_Colleges

Predicting whether a university is private or public using tree based models (ie: decision tree classifier, random forest classifier and gradient boosted tree classifier) using PySpark and Databricks.

Language:Jupyter Notebook1 20

SEM-Generating-Keywords-for-Google-Ads

Autonomously creating keywords for Google Ads search engine marketing campaign

Language:Jupyter Notebook1 20

SMS-Spam-Prediction

Predicting whether an SMS (text message) is spam using natural language processing (NLP), naive Bayes classifier and cross validation (in Python)

Language:Jupyter Notebook1 20

TV-HALFTIME-SHOWS-AND-THE-BIG-GAME

EDA project using SQL in Jupyter Notebooks, focusing on the history of games, broadcasts and performances for the National Football League

1 10

WalmartStockEDA

An EDA of Walmart stock data using Databricks, Spark and PySpark.

Language:Jupyter Notebook1 20

Whale-Image-Classification-

Computer Vision project

Language:Jupyter Notebook1 20

TweetClassificationLSTM

This project details the creation of a multi-classification Recurent Neural Network (RNN) model using Tensorflow / Keras to predict Tweet emotions. More specifically, this notebook uses a bidirectional LSTM as a means to capture additional semantics often found in sequential (language) data. This project utilizes the Tweet Emotion Recognition with TensorFlow dataset provided by Kaggle.

Language:Jupyter Notebook020

docs

010

GoldenAgeofGaming

Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to Mordor Intelligence. With so much money at stake, the major game publishers are increasingly more incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed? In this project, I explore the top 400 best-selling video games created between 1977 and 2020. This is achieved by comparing gaming sales data with critic and user reviews data. In doing so, we can discover whether video games have improved as the gaming market has grown. Each table is limited to 400 rows for this experiment, but the complete dataset with over 13,000 games can be found on Kaggle.

020

GoogleTrendsEDA

Language:Jupyter Notebook020

GoTNetworkAnalysis

Analysis of the co-occurrence network of Game of Thrones characters in the Game of Thrones books. Here, two characters are considered to co-occur if their names appear in the vicinity of 15 words from one another in the books. This project utilized graph analysis and modeling frameworks such as Google's PageRank Algorithm.

Language:Jupyter Notebook000

LeondraJames

Leondra R. Gonzalez's repositories

International-Debt-Stats-EDA

AdClick_Fraud

Customer-Churn-w-Logistic-Regression

Disney-Movies-Box-Office-Hits

TheMatrixScript_NLP

AWSSageMaker_PythonXGBoostTutorial

Boston-Housing---Random-Forest-XGBoost

Degrees-That-Pay-You-Back

Film-Similarity-NLP-with-KMeans-Hierarchical-Clustering

HarvardXCapstone---Film-Recommender-System

Hyundai-Cruise-Ship-Crew-Prediction

MarketBasketAnalysis-MBA-

MarkovChains_MultiTouchAttribution

MobileGameABTest

Netflix-Content-Duration-Analysis

Private_Public_Colleges

SEM-Generating-Keywords-for-Google-Ads

SMS-Spam-Prediction

TV-HALFTIME-SHOWS-AND-THE-BIG-GAME

WalmartStockEDA

Whale-Image-Classification-

TweetClassificationLSTM

docs

GoldenAgeofGaming

GoogleTrendsEDA

GoTNetworkAnalysis

Graduate-Admission-Bias-Hypothesis-Testing

LeondraJames

PredictTaxiFares

sme-dle-case-study-datacamp