ianpcox

Ian P. Cox's repositories

Resume_Personality_Insights

The purpose of this project was to leverage the IBM Watson Personality Insights service API to ingest a PDF of a résumé and return an evaluation of the text along several personality measures. NB - IBM Watson has since deprecated this service, rebranded as IBM Watson Natural Language Understanding. I aim to update this project when I get access to the NLU service.

Language:Jupyter NotebookMIT1 1 2

AI_Algorithms

A repo for reusable objects for the

MIT010

Analyzing_Album_Sales_using_SQL

An analysis of music purchase records using SQL. Focuses on the following: * The performance of the support team, * Sales by country, and * Sales of Individual Tracks vs. Complete Albums,

Language:Jupyter NotebookMIT000

Analyzing_NYC_High_School_Data

An in-depth analysis of educational outcomes in high schools located in NYC boroughs; the analysis focuses on identifying areas for deeper analysis using correlational data. The study identifies the following as attributes of interest: borough safety, race, gender, and AP exam scores.

Language:Jupyter NotebookMIT010

AVI_to_MP4_File_Converter

A file converter for old .avi files of shows I cannot stream anymore :(

Language:PythonGPL-3.0010

Building_A_Handwritten_Digits_Classifier

This project ingests pictures of handwritten digits and classifies them.

Language:Jupyter NotebookMIT000

Building_a_Spam_Filter_using_Naive_Bayes

Using Conditional Probability, the goal of this project was to construct a multinomial Naive Bayes algorithm to handle classification of new messages with an expected accuracy of 80%. The model exceeded expectation by classifying new messages with an accuracy of ~86%.

Language:Jupyter NotebookMIT010

Identifying_Heavy_Traffic_Indicators

Explanatory data analysis featuring four separate explanatory variables: time of day, seasonality, day of the week, and weather

Language:Jupyter NotebookMIT010

Sentiment_Analyzer_with_BERT_NN

Language:Jupyter Notebook010

Storytelling_Using_Data_Visualization_Exchange_Rates

A visualization-driven analysis of exchange rates highlighting three distinct stories: 1) How the euro-dollar rate has changed during the coronavirus pandemic; 2) The 2020 data and the 2016-2019 data as a baseline for analysis; 3) How the euro-dollar rate changed during the 2007-2008's financial crisis. We can also show the data for 2016 and 2009 for comparison; we can use a line plot. We show comparatively how the euro-dollar rate changed under the last three US presidents (George W. Bush (2001-2009), Barack Obama (2009-2017), and Donald Trump (2017-2021)).

Language:Jupyter NotebookMIT010

Analyzing_Star_Wars_Survey_Data

A brief, fun analysis of Star Wars survey data highlighting rankings for various Star Wars movies and fan preferences for male-identified vs. female-identified characters.

Language:Jupyter NotebookMIT010

Cleaning_Analyzing_Employee_Exit_Surveys

An in-depth analysis of Employee Exit Interviews focusing on the following questions: * Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer? * Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Language:Jupyter NotebookMIT010

EDA_ProfitableAppProfiles

Exploratory Data Analysis of App Profiles in the iOS App Store and the Google Play marketplaces

Language:Jupyter NotebookMIT000

Exploring_eBay_Car_Sales_Data

Exploratory Data Analysis for car sales on the popular online trading platform eBay

Language:Jupyter NotebookMIT000

Exploring_Hacker_News_Posts

Exploratory Data Analysis for comments found on the popular tech publication website Hacker News

Language:Jupyter NotebookMIT000

How_to_Win_Jeopardy

The goal of this project was to use hypothesis testing to recommend how to best prepare for the popular trivia gameshow <i>Jeopardy</i> with the expected outcome of earning the most money. There were two primary areas of analysis: * How often a given answer can be used for a question, and * How often questions are repeated. A chi-squared test is used to narrow down the questions into two categories: * Low Value, and * High Value.

Language:Jupyter NotebookMIT010

imagerepo

MIT000

Investigating_Bias_in_Fandango_Movie_Ratings

This study is a follow-up to exemplary data journalism by Walt Hickey in 2015; the focus of the study was to determine whether the movie ratings on the ratings aggregator Fandango were still biased and/or dishonest. Using kernel density plots (FiveThirtyEight) to compare ratings distributions over time, a change can be noted in the ratings - the assumption is that Fandango fixed the biases identified by Hickey.

Language:Jupyter NotebookMIT010

Mobile_App_for_Lottery_Addiction

STATS_Using data from the popular Canadian lottery <i>Lotto 6/49</i> and probabilistic calculations, a proof-of-concept is created for a mobile application to both prevent and assist in the treatment of lottery addiction by helping users to better estimate their chances of winning.

Language:Jupyter NotebookMIT010

Predicting_Car_Prices_with_KNN

A quick project to build a predictive model for car pricing using KNN. The model uses prices, other car features, and various k values to create, train and test univariate and multivariate models. Visualizations were build with a slider to enable the reader to interact with the data.

Language:Jupyter Notebook000

product-line-prediction

Created for toolchain: https://cloud.ibm.com/devops/toolchains/ee40089a-16ea-4c03-beaa-92cb47ac5484?env_id=ibm:yp:us-south

Language:JavaScriptApache-2.0000

Querying_CIA_Factbook_Data

A SQL-driven (sqlite3) analysis of descriptive statistics for various countries; focus of the analysis is on highest population and population growth rates, population projection for next year, comparison of fertility and mortality rates, and per-capita ratios for land mass.

Language:Jupyter NotebookMIT010

Recommending_Data_Science_Learning_Content

An analysis of popular (loosely-defined) data science questions posted on the popular data science platform Data Science Stack Exchange using SQL to query data directly from the Stack Exchange Data Explorer (SEDE). The analysis employs meta-data analysis in the form of tags: * the frequency of their use and views, * pair-wise relationships between tags, and * ratios of identified popular tags and the overall number of tagged questions. The study concludes with a recommendation to increase deep learning content.

Language:Jupyter NotebookMIT000

Targeted_eLearning_Product_Marketing

This project focuses on how an e-learning company could determine the best markets to offer their programming coursework. The analysis focuses on both job role aspirations and learner interests, locations and population densities, availability of discretionary funding for learning and isolating outlier effects. The study concludes with the recommendation to focus on the US, with Canada and India comprising the second-best option for regional sales focus.

Language:Jupyter NotebookMIT000