angang-li / airbnb_data_analysis

Explore business insights from Chicago Airbnb data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chicago Airbnb Data Analysis

Installation

The code was developed using the Anaconda distribution of Python, versions 3. Python libraries used are numpy, pandas, datetime, matplotlib, seaborn, sklearn, scipy, statsmodels, random, PIL, requests, collections, and pickle.

Background

In this project, I used open source Chicago Airbnb data (http://insideairbnb.com/get-the-data.html) to answer 4 business questions:

  • Q1: How do listing information (description words, price per person per nignt) differ among different neiborhoods?
  • Q2: Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Chicago?
  • Q3: What are the busiest times of a year to visit Chicago? By how much do prices spike?
  • Q4: What are the factors that explain the listing price the most?

These questions were answered using statistics, regression, and visualization.

File Descriptions

There are 4 notebooks available here to showcase work related to the above questions.

  1. explore_part1.ipynb: load data and design new features based on the available dataset
  2. explore_part2.ipynb: exploratory data analysis to answer Q1-Q3
  3. model_part1.ipynb: prepare data for regression analysis, including the handling of categorical variables and missing data
  4. model_part2.ipynb: train regression model and use the trained model to answer Q4

Markdown cells in each notebook were used to assist in walking through the thought process for individual steps.

Results

The main findings of the code can be found at the post available here.

Licensing, Authors, Acknowledgements

Please find the Licensing for the dataset at Airbnb data portal. Other than that, feel free to play with the code here.

About

Explore business insights from Chicago Airbnb data


Languages

Language:Jupyter Notebook 100.0%