Ian P. Cox's repositories
Resume_Personality_Insights
The purpose of this project was to leverage the IBM Watson Personality Insights service API to ingest a PDF of a résumé and return an evaluation of the text along several personality measures. NB - IBM Watson has since deprecated this service, rebranded as IBM Watson Natural Language Understanding. I aim to update this project when I get access to the NLU service.
AI_Algorithms
A repo for reusable objects for the
Analyzing_Album_Sales_using_SQL
An analysis of music purchase records using SQL. Focuses on the following: * The performance of the support team, * Sales by country, and * Sales of Individual Tracks vs. Complete Albums,
Analyzing_NYC_High_School_Data
An in-depth analysis of educational outcomes in high schools located in NYC boroughs; the analysis focuses on identifying areas for deeper analysis using correlational data. The study identifies the following as attributes of interest: borough safety, race, gender, and AP exam scores.
AVI_to_MP4_File_Converter
A file converter for old .avi files of shows I cannot stream anymore :(
Building_A_Handwritten_Digits_Classifier
This project ingests pictures of handwritten digits and classifies them.
Building_a_Spam_Filter_using_Naive_Bayes
Using Conditional Probability, the goal of this project was to construct a multinomial Naive Bayes algorithm to handle classification of new messages with an expected accuracy of 80%. The model exceeded expectation by classifying new messages with an accuracy of ~86%.
Identifying_Heavy_Traffic_Indicators
Explanatory data analysis featuring four separate explanatory variables: time of day, seasonality, day of the week, and weather
Storytelling_Using_Data_Visualization_Exchange_Rates
A visualization-driven analysis of exchange rates highlighting three distinct stories: 1) How the euro-dollar rate has changed during the coronavirus pandemic; 2) The 2020 data and the 2016-2019 data as a baseline for analysis; 3) How the euro-dollar rate changed during the 2007-2008's financial crisis. We can also show the data for 2016 and 2009 for comparison; we can use a line plot. We show comparatively how the euro-dollar rate changed under the last three US presidents (George W. Bush (2001-2009), Barack Obama (2009-2017), and Donald Trump (2017-2021)).
Analyzing_Star_Wars_Survey_Data
A brief, fun analysis of Star Wars survey data highlighting rankings for various Star Wars movies and fan preferences for male-identified vs. female-identified characters.
Cleaning_Analyzing_Employee_Exit_Surveys
An in-depth analysis of Employee Exit Interviews focusing on the following questions: * Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer? * Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
EDA_ProfitableAppProfiles
Exploratory Data Analysis of App Profiles in the iOS App Store and the Google Play marketplaces
Exploring_eBay_Car_Sales_Data
Exploratory Data Analysis for car sales on the popular online trading platform eBay
Exploring_Hacker_News_Posts
Exploratory Data Analysis for comments found on the popular tech publication website Hacker News
How_to_Win_Jeopardy
The goal of this project was to use hypothesis testing to recommend how to best prepare for the popular trivia gameshow <i>Jeopardy</i> with the expected outcome of earning the most money. There were two primary areas of analysis: * How often a given answer can be used for a question, and * How often questions are repeated. A chi-squared test is used to narrow down the questions into two categories: * Low Value, and * High Value.
Investigating_Bias_in_Fandango_Movie_Ratings
This study is a follow-up to exemplary data journalism by Walt Hickey in 2015; the focus of the study was to determine whether the movie ratings on the ratings aggregator Fandango were still biased and/or dishonest. Using kernel density plots (FiveThirtyEight) to compare ratings distributions over time, a change can be noted in the ratings - the assumption is that Fandango fixed the biases identified by Hickey.
Mobile_App_for_Lottery_Addiction
STATS_Using data from the popular Canadian lottery <i>Lotto 6/49</i> and probabilistic calculations, a proof-of-concept is created for a mobile application to both prevent and assist in the treatment of lottery addiction by helping users to better estimate their chances of winning.
Predicting_Car_Prices_with_KNN
A quick project to build a predictive model for car pricing using KNN. The model uses prices, other car features, and various k values to create, train and test univariate and multivariate models. Visualizations were build with a slider to enable the reader to interact with the data.
product-line-prediction
Created for toolchain: https://cloud.ibm.com/devops/toolchains/ee40089a-16ea-4c03-beaa-92cb47ac5484?env_id=ibm:yp:us-south
Querying_CIA_Factbook_Data
A SQL-driven (sqlite3) analysis of descriptive statistics for various countries; focus of the analysis is on highest population and population growth rates, population projection for next year, comparison of fertility and mortality rates, and per-capita ratios for land mass.
Recommending_Data_Science_Learning_Content
An analysis of popular (loosely-defined) data science questions posted on the popular data science platform Data Science Stack Exchange using SQL to query data directly from the Stack Exchange Data Explorer (SEDE). The analysis employs meta-data analysis in the form of tags: * the frequency of their use and views, * pair-wise relationships between tags, and * ratios of identified popular tags and the overall number of tagged questions. The study concludes with a recommendation to increase deep learning content.
Targeted_eLearning_Product_Marketing
This project focuses on how an e-learning company could determine the best markets to offer their programming coursework. The analysis focuses on both job role aspirations and learner interests, locations and population densities, availability of discretionary funding for learning and isolating outlier effects. The study concludes with the recommendation to focus on the US, with Canada and India comprising the second-best option for regional sales focus.