Leondra R. Gonzalez (LeondraJames)

LeondraJames

Geek Repo

Company:Microsoft

Location:Indianapolis, IN

Home Page:https://www.linkedin.com/in/leondragonzalez/

Github PK Tool:Github PK Tool

Leondra R. Gonzalez's repositories

International-Debt-Stats-EDA

Used SQL in Jupyter Notebooks to analyze and explore data on international debts and codes.

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

AdClick_Fraud

Capstone project #2 for the Harvard University Professional Certificate in Data Science

Language:RStargazers:2Issues:1Issues:0

Customer-Churn-w-Logistic-Regression

Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.

Language:Jupyter NotebookStargazers:2Issues:2Issues:0

Disney-Movies-Box-Office-Hits

Analysis of Disney's top grossing films (adjusted for inflation) in Python, using regression to attribute film genre to success. The project includes using regression on the data, as well as bootstrap regression to determine confidence intervals of the intercept and coefficients.

Language:Jupyter NotebookStargazers:2Issues:2Issues:0

TheMatrixScript_NLP

A project utilizing NLP techniques and analysis including text mining, document term matrices, sentiment analysis, wordclouds and topic modeling with LDA.

Language:HTMLStargazers:2Issues:1Issues:0

AWSSageMaker_PythonXGBoostTutorial

Python XGBoost model, using Amazon SageMaker, EC2 instances and S3 buckets. Used to prepare, partition, train, tune, predict and evaluate model. Project involves predicting customers who sign up for a financial product at a bank.

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

Boston-Housing---Random-Forest-XGBoost

Leveraging regression random forest and XGBoost algorithms with cross validation and grid search to tune the best performing model on the Boston Housing dataset. Analyzed and visualized the most statistically significant features for both models. Achieved an RMSE of $2K

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

Degrees-That-Pay-You-Back

A cluster analysis leveraging the kmeans algorithm to determine which degrees are likely to yield which levels of income based on historical data.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

Film-Similarity-NLP-with-KMeans-Hierarchical-Clustering

Used NLP techniques (tokenization, stemming, vectorization for TF-IDF) and clustering algorithms (Kmeans and Hierarchical clustering) to mine the "similarities" between films based on their plots provided by IMBD and Wikipedia. The dataset contains the titles of the top 100 movies on IMDb.

Language:PythonStargazers:1Issues:2Issues:0

HarvardXCapstone---Film-Recommender-System

Capstone Submission #1 for the Harvard University Professional Certificate in Data Science.

Language:RStargazers:1Issues:1Issues:0

Hyundai-Cruise-Ship-Crew-Prediction

Predicting the number of required crew needed for manning a Hyundai Cruise ship based on information like number of cabins and passengers using linear regression. Leveraged SQL and PySpark,

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

MarketBasketAnalysis-MBA-

Use of associative rule mining using the APRIORI algorithm

Language:RStargazers:1Issues:1Issues:0

MarkovChains_MultiTouchAttribution

Multi touch attribution models, including Markov chains

Language:RStargazers:1Issues:2Issues:0

MobileGameABTest

2 A/B tests, testing the difference in 1) average player 1 day and 2) 7 day retention against control (old player level) and new version (new player level)

Netflix-Content-Duration-Analysis

Given the large number of movies and series available on Netflix, it is a perfect opportunity to dive into the entertainment industry with an analysis of Netflix content durations. This analysis aims to understand trends in content duration on the Netflix platform since 2011 through 2020.

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

Private_Public_Colleges

Predicting whether a university is private or public using tree based models (ie: decision tree classifier, random forest classifier and gradient boosted tree classifier) using PySpark and Databricks.

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

SEM-Generating-Keywords-for-Google-Ads

Autonomously creating keywords for Google Ads search engine marketing campaign

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

SMS-Spam-Prediction

Predicting whether an SMS (text message) is spam using natural language processing (NLP), naive Bayes classifier and cross validation (in Python)

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

TV-HALFTIME-SHOWS-AND-THE-BIG-GAME

EDA project using SQL in Jupyter Notebooks, focusing on the history of games, broadcasts and performances for the National Football League

WalmartStockEDA

An EDA of Walmart stock data using Databricks, Spark and PySpark.

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

Whale-Image-Classification-

Computer Vision project

Language:Jupyter NotebookStargazers:1Issues:2Issues:0

TweetClassificationLSTM

This project details the creation of a multi-classification Recurent Neural Network (RNN) model using Tensorflow / Keras to predict Tweet emotions. More specifically, this notebook uses a bidirectional LSTM as a means to capture additional semantics often found in sequential (language) data. This project utilizes the Tweet Emotion Recognition with TensorFlow dataset provided by Kaggle.

Language:Jupyter NotebookStargazers:0Issues:2Issues:0
Stargazers:0Issues:1Issues:0

GoldenAgeofGaming

Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to Mordor Intelligence. With so much money at stake, the major game publishers are increasingly more incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed? In this project, I explore the top 400 best-selling video games created between 1977 and 2020. This is achieved by comparing gaming sales data with critic and user reviews data. In doing so, we can discover whether video games have improved as the gaming market has grown. Each table is limited to 400 rows for this experiment, but the complete dataset with over 13,000 games can be found on Kaggle.

Stargazers:0Issues:2Issues:0
Language:Jupyter NotebookStargazers:0Issues:2Issues:0

GoTNetworkAnalysis

Analysis of the co-occurrence network of Game of Thrones characters in the Game of Thrones books. Here, two characters are considered to co-occur if their names appear in the vicinity of 15 words from one another in the books. This project utilized graph analysis and modeling frameworks such as Google's PageRank Algorithm.

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:2Issues:0
Stargazers:0Issues:0Issues:0

PredictTaxiFares

An analysis and prediction of taxi fares based on 2013 NYC data using decision trees and random forests.

Stargazers:0Issues:1Issues:0

sme-dle-case-study-datacamp

For consideration: Subject matter expert (SME) for Data Literacy and Essentials (DLE).

Language:RStargazers:0Issues:1Issues:0