tchintchie's repositories


There’s a free dataset on Kaggle with Kickstarter project data from May 2009 to March 2018. Let's perform some EDA on the data (Exploratory Data Analysis) to gather any insights. In the future for a more challenging project, we can apply machine learning to the dataset to predict whether a project will succeed or not. Note: the dataset is in a zip file with 2 csv files in it: 2016 and 2018. The data in both files is mostly the same but the 2016 one is older and uses a non-standard format, so for this let's just use the 2018 file only. Questions to Answer: 1a. Examine the state column to see unique values and counts. 1b. Show a pie chart of the state project count for all projects. 1c. Create a new "Completed" dataframe that removes any rows with state of 'live', 'undefined', or suspended. note - from here out we'll be looking at the completed project data unless mentioned otherwise 2a. What is the overall success rate for all completed kickstarter projects? 2b. Which 5 projects were pledged the most money (usd_pledged_real)? 2c. Which 5 projects had the most backers? 2d. Which year had the most competition? (# of projects) 3a. What is the success rate for all projects broken down by main_category? 3b. Show a horizontal bar chart for project success rate by main_category, sorted by highest to lowest. 3c. Within the Games main_category, what is the success rate for each category within it? 4a. Calculate the 'pct_of_goal' for each completed project 4b. What were the top 5 projects when looking at pct_of_goal for all time? 4c. Plot a histogram distribution of all completed projects by pct_of_goal 4d. Create 2 histogram subplots by pct_of_goal: 1) state=successful, and 2) all others (failed) 5a. What is the average usd_goal_real for all completed kickstarter projects, broken down by main_category. 5b. What is the median usd_goal_real for all completed kickstarter projects, broken down by main_category. 5c. What is the average usd_pledged_real for all completed kickstarter projects, broken down by main_category. 5d. What is the median usd_pledged_real for all completed kickstarter projects, broken down by main_category. 5e. What insights does this information provide? 5f. Based on this information, if someone wanted to choose the main_category with the highest combined success rate and pledged dollar amount, which one would you recommend? 6a. Create a new column 'months' that shows how many months the project was active between launch and deadline. 6b. Compare the avg months for successful projects vs non-successful. Add visuals if you'd like. 6c. Does the length of a project in months seem to have an impact? Let's zoom in on Games: Video Games (main_category: category) 7a. Calculate the expected value for the Games: Video Games category, with the expected value defined as (median of usd_pledged_real)* (success rate of completed projects). 7b. Do this again but broken down by deadline year 7c. Show this in a bar chart 7d. What insights does this data provide you? Let's zoom in on personal planners 8a. Calculate the project count, success rate, and pct_of_goal for all projects with 'planner' in the name. Check for spelling variations in upper/lowercase. 8b. How about all projects with both 'planner' and 'Panda' in the name? 8c. Congrats Panda Planner! (That's my bro's company) Bonus insights - feel free to add any other interesting findings from the dataset here Future ML project: Given a sample kickstarter project, can you predict the usd_pledged_real, and whether it will be successful? What features (data points) are most important in determining if a project will be successful or not?

Language:Jupyter NotebookStargazers:1Issues:0Issues:0


Guided Project from Dataquest

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Shiny app to infer caes from new deaths of COVID-19



Learn about the fundamentals of Python programming and data science from

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Building A Handwritten Digits Classifier as put together by Dataquest (Guided Project)

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Exploratory Data Analysis of Airplane Crashes from Kaggle

Language:Jupyter NotebookStargazers:0Issues:2Issues:0


Guided Project: Designing and Creating a Database

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


trying to create an email scraper with streamlit

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Face Recognition App

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Guided Project from Dataquest (statistics course)

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Many American cities have communal bike sharing stations where you can rent bicycles by the hour or day. Washington, D.C. is one of these cities. The District collects detailed data on the number of bicycles people rent by the hour and day.

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0


A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


NLP Web App

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Guided Project from Dataquest

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Car Price Predictor Streamlit App

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Repository to store sample python programs for python learning

Language:Jupyter NotebookStargazers:0Issues:0Issues:0


Solutions for projects.



Predicting Stock prices

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

...under construction



Playing around with the titanic dataset to predict the possibility of survival

Language:Jupyter NotebookStargazers:0Issues:0Issues:0





scraping a website looking for apartments

Language:Jupyter NotebookStargazers:0Issues:0Issues:0