pmb-7684 / Paula_Portfolio

Example data science portfolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science Portfolio

R programming projects

New York City Leading Causes of Death is a project using exploratory analysis. The data set comes from New York City’s NYCOpenData and contains information on the cause of death from 2007. We only focused on women in NYC and their cause of death. The project is also completed in python analyzing causes of death for men.

  • New_York_City_leading_Causes_of_Death data set contained 1,272 observations and 7 variables. Each observation represents a cause of death.
  • Used the programming language R to clean and performed exploratory data analysis.
  • We determined the number one cause of death for women in NYC was heart disease. The second highest cause of death was Cancer for most years.

Personal Project

Pokémon API Vignette is a group project on resting APIs at Poké Api. The vignette helps users on how to request information on their favorite Pokémon by name or ID from a resting API. The information includes basic stats, training information, and moves. There is an additional option to request information about berries.

  • Completed exploratory analysis with visualizations using data extracted from the api.

Group Academic Project (NCSU)

The Right Conditions for COVID-19 is a project where we focused on sixteen different universal conditions that could contribute to a person being infected with COVID-19.

  • World wide Coronavirus (COVID-19) data set which contained 50,350 observations and 41 variables.
  • Used the programming language R to clean the data.
  • Optimized linear, Lasso, Random Forest regressor, and other Machine learning techniques to determine the number of deaths from COVID-19 in 2020.
  • Used Tableau to visualize important points in the data.
  • Created a report addressing my question - why is a person infected with COVID-19? This question is based on if the following conditions contribute to being infected. Those conditions were stringency index, population, population density, portion of the population over age 65, GDP per capita, extreme poverty, cardiovascular death rate, diabetes prevalence, smoker or not, number of hospitals, life expectancy and human development index.

Capstone - Academic Project (UNCW)

Effects of COVID-19 on Crime in Chicago explored the effects of COVID-19 on the crime rate in Chicago, Illinois so far in 2020. Chicago has been known for having a high crime rate especially murder rate. In 2019, 492 individuals lost their lives and 567 were killed in 2018. So, in 2020 was there a significant effect on crime in general due to the pandemic, social distancing, and state government mandates?"

  • Acquired data from the City of Chicago portal
  • The data set contained over 260,000 observations and 22 variables. Compared the period from 01/01/2019 to 08/01/2019 (pre COVID-19) to 01/01/2020 to 08/01/2020 (beginning months of the pandemic).
  • Acquired data from data.gov for the COVID cases in Chicago. Note: link to the original website is no longer active. The web site above from the CDC provides similar information.
  • Acquired geospatial data from the portal as well.

Academic Project (UNCW)

Crime in Chicago Prediction, we used machine learning techniques to predict the likelihood that a person would be "Arrested". We created a presentation that discusses Classification, Logistic Regression, and Stacking.

  • Data set comes from the City of Chicago’s Data Portal and contains information about crime in Chicago in 2018. Models were built to predict the likelihood that a person would be Arrested."
  • We reduced the size of the dataset by sampling 15,000 observations from the original dataset, which contained 267,000 observations.
  • Optimized Classification tree, Logistic regression, Stacking, Random Forest, K-fold regression, and Regression with stepwise (Backwards and Forward). Three were discussed in this presentation: Classification, Logistic regression, and Stacking.

Academic Project (UNCW)

Python Projects

New York City Leading Causes of Death is a project using exploratory analysis. The data set comes from New York City’s NYCOpenData and contains information on the cause of death from 2007. We focused on men in NYC and their cause of death. The project is also completed in R analyzing causes of death for women.

  • New_York_City_leading_Causes_of_Death data set contained 1,272 observations and 7 variables. Each observation represents a cause of death.
  • Used the programming language Python to clean and performed exploratory data analysis.
  • We determined the number one cause of death for men in NYC was heart disease. The second highest cause of death was Cancer for most years.

Personal Project

Visualizations Using R, Python, and Tableau

** Upcoming **

Additional visualization using the NYC's causes of death and Covid-19 data from John Hopkins.

About

Example data science portfolio