Randy Leon's repositories
Information-Architectures
assignments and projects for Yeshiva University's Katz School Information Architectures course, spring 2020
About-Me
👋 Hi, I’m Randy! 👀 I’m interested in becoming a data scientist 🌱 I’m currently learning Python, SQL, Tableau, and AWS 💞️ I’m looking to collaborate on beginner to intermediate data science projects to showcase some skills! 👀 Some of my interests including weightlifting, geopolitics, and Yu-Gi-Oh the Card Game.
Analytics-Programming
assignments and projects for Yeshiva University's Katz School Analytics Programming course, fall 2019
BI-Dashboards
Sample dashboards to showcase my work in database and BI tools
Cleaning-a-Messy-Data-Set-w-Python
Cleaning a wine data set using Python 3 in a Jupyter Notebook. Packages include Seaborn, NumPy, and Sklearn.
MS-Excel-Projects-Using-Healthcare-Data
Projects done using MS Excel
Naive-Bayes-Sentiment-Analysis_Using_Beautiful_Soup
Naïve Bayes classifiers are widely recognized for their efficacy at classifying text data (e.g., sentiment analysis). Many organizations rely on sentiment analysis algorithms to help them gauge the opinions of both existing and potential customers. Sentiment analysis algorithms to the online product/service reviews help influence business decisions
Structured-Data-Management-SQL-
assignments and projects for Yeshiva University's Katz School Structured Data Management course, fall 2020
Decision-Tree-versus-Random-Forest-Performance-on-NY-State-Graduation-Data
Decision trees and random forest models can both be very effective when applied to classification problems. We compared the performance vs. complexity payoff between both models in this example using Pandas and NumPy
Linear-Regression-Using-Sklearn-in-Python
Linear Regression project on automobile data featuring checks using k-fold cross validation.
Visual-Design-and-Storytelling
assignments and projects for Yeshiva University's Katz School Structured Visual Design and Storytelling, fall 2019
Can-we-predict-if-a-mushroom-is-poisonous-
Prepared the UCI Mushroom data for construction of predictive models. My team and I also cross-trained the models for accuracy and precision.
Clustering_and_SVM_to_Predict_Online_Purchases
Particular interest to most online retailers is whether or not a site visitor ends up executing a purchase while engaged with the web site. We used supervised learning methods such as K-nearest neighbors and support vector machines in Python to predict whether or not online shoppers were more willing to make a purchase.
Excel-Exercise-Using-Shipping-Data
Allocated columns of missing data to other workbooks. I transformed and concatenated the column data across using VLOOKUP and then combined the tables all into one new sheet for reference.
Feature-Selection-and-Dimensionality-Reduction
Data science project applying feature selection/dimensionality reduction techniques to identify the explanatory variables to be included within a linear regression model that predicts the number of times an online news article will be shared using Python 3 in a Juypter Notebook.
Implementing-a-Series-of-Regression-Models-on-School-Data
Constructing and compare/contrast a series of regression models that predict the number of student “dropouts” in a school dataset relative to certain properties/characteristics of a given school district + associated student subgrouping.
K-Nearest-Neighbors-and-Support-Vector-Machine-Models-on-Insurance-Data
Python project that used KNN and SVM models to classify insurance data found on Kaggle.com
Sales-Dashboard-PowerBI
Making a dynamic sales dashboard from sample sales data
Sentiment-Analysis---A-Machine-Learning-Approach-into-Hideo-Kojima-s-Divisive-Platformer
Our team sought to perform sentiment analysis on Twitter tweets in anticipation for Hideo Kojima's video game release, Death Stranding, in 2019. We sourced the Tweets from two libraries, preprocessed them, stored them using MongoDB and then performed sentiment analysis.
Understanding-Classification-Model-Performance-Metrics-On-Diabetes-Dataset
Evaluation of the performance of classification models can be facilitated through a combination of calculating certain types of performance metrics and generating model performance evaluation graphics. The purpose of this exercise is to calculate a suite of classification model performance metrics via Python code functions.