codeboy47 / eda-hotel-booking

Explored and analyzed the dataset to discover important factors that govern the bookings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Banner

In this project, I have attempted to analyze a hotel booking dataset and come up with some relevant conclusions about the factors that contribute to count of bookings. No personal information of customer is provided in this dataset.

πŸ“– Problem Statement

A dataset containing 119390 records across 32 features has been given with information regarding bookings of two hotels from July 2015 to August 2017. These two hotels are City Hotel and Resort Hotel.

The main objective is to explore the given dataset and discover the factors which govern the bookings. The dataset will be analyzed and from the conclusions drawn from it will be used to recognize the missteps taken by the manager. With this information, hotels will be equipped to improve their performance.

Data analysis is performed to answer the following questions:

  • Which hotel is more preferred among travelers?
  • Which hotel retains more customers?
  • Which is the busiest month?
  • Which is the most popular room type?
  • From which country the greatest number of bookings were made?
  • How Long People Stay in the hotel?
  • How many bookings were cancelled?
  • πŸ“– Approach

    1. Understanding the business task.
    2. Import relevant libraries and define useful functions.
    3. Reading data from files given.
    4. Data inspection.
    5. Data cleaning.
    6. Exploratory data analysis, to find which factors affect the bookings and how they affect it.
    7. Conclusions drawn from analysis.
    8. Build interactive dashboard.

    πŸ“– Exploratory Data Analysis

    EDA was carried out in 3 steps:

    πŸ“Š Univariate Analysis

    Uni means one and variate means variable, so in univariate analysis, there is only one dependable variable. The objective of univariate analysis is to derive the data, define and summarize it, and analyze the pattern present in it. In a dataset, it explores each variable separately. Univariate analyses were done on:

  • Percentage of bookings in each hotel.
  • Percentage of repeated and non-repeated guests.
  • Percentage of bookings that got cancelled.
  • Number of bookings made for each room type.
  • Number of bookings made from each country.
  • For how long guests commonly stay in the hotel.
  • Number of bookings made in each month.
  • πŸ“Š Bivariate Analysis

    Bi means two and variate means variable, so here there are two variables. The analysis is related to cause and the relationship between the two variables. Bivariate analyses were done on:

  • Revenue generated by each hotel.
  • Percentage of repeated guests in each hotel.
  • Percentage of repeated guests in each distribution channel.
  • Percentage of cancelled and non-cancelled bookings in each hotel.
  • Number of cancelled and non-cancelled bookings among repeated and non-repeated guests.
  • Kernel density estimate of number of days in waiting list for cancelled and non-cancelled bookings.
  • Kernel density estimate of lead time for cancelled and non-cancelled bookings.
  • Number of bookings cancelled when reserved room type is the same and different as the assigned room type.
  • Percentage of cancelled and non-cancelled bookings in each distribution channel.
  • Change in the length of stay with the change in ADR.
  • πŸ“Š Correlation Analysis

    It is used to measure the strength of the linear relationship between two variables and compute their association. Correlation analysis calculates the level of change in one variable due to the change in the other. Correlation analysis of the dataset was carried out using a correlation heatmap with the features, 'lead_time', 'adr', 'total_guests', 'total_stays_in_nights', 'previous_cancellations', 'booking_changes', 'days_in_waiting_list', 'required_car_parking_spaces', 'total_of_special_requests' and 'previous_bookings_not_canceled'.

    πŸ“Š Data Visualization

    An interactive dashboard was also created with Tableau to display charts associated with the analysis.

    Banner

    Click here to interact with the data visualization.

    πŸ“˜: Conclusion

    The following conclusions were drawn from analysis:

  • City Hotel seems to be more preferred among travelers and it also generates more revenue.
  • Most number of bookings are made in July and August.
  • Room Type A is the most preferred room type among travelers.
  • Most number of bookings are made from Portugal.
  • Most of the guest stays for 1-4 days in the hotels.
  • Resort Hotel retains a greater percentage of guests.
  • Around one-fourth of the total bookings gets cancelled. More cancellations are from City Hotel.
  • New guests tend to cancel bookings more than repeated customers.
  • Lead time, number of days in waiting list or assignation of reserved room to customer does not affect cancellation of bookings.
  • Corporate has the most percentage of repeated guests while TA/TO has the least whereas in the case of cancelled bookings TA/TO has the most percentage while Corporate has the least.
  • The length of the stay decreases as ADR increases probably to reduce the cost.
  • Visits planned with longer stays are booked earlier than those planned with shorter stay.
  • πŸ“œ Credits

    Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast

    Contact me for Data Science Project Collaborations

    LinkedIn Badge GitHub Badge Medium Badge Resume Badge

    πŸ“š References

    About

    Explored and analyzed the dataset to discover important factors that govern the bookings.


    Languages

    Language:Jupyter Notebook 100.0%