LFattorini / Project-1---Writing-a-Data-Scientist-Blog-Post-Udacity

For this project, I chose the Airbnb dataset of the city of Florence. Specifically, I attempted to answer the following questions using the most popular Natural Language Processing techniques applied to review data: 1. How do guests experience their stay in Airbnb in Florence? 2. What are the main topics in guest reviews? 3. How best to predict the topic of a new review?

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project-1---Writing-a-Data-Scientist-Blog-Post-Udacity

Installation

To run the code the following libraries are required:

  • numpy
  • seaborn
  • pandas
  • matplotlib
  • spacy
  • nltk
  • wordcloud
  • pickle
  • PIL
  • gensim
  • pyLDAvis
  • tabulate
  • re
  • sklearn
  • vaderSentiment
  • langdetect
  • deep_translator
  • textblob

The analysis has been run on JupiterLab using Python 3.9..

Project Goal

For this project, I chose the Airbnb dataset of the city of Florence. Specifically, I attempted to answer the following questions using the most popular Natural Language Processing techniques applied to reviews data:

  1. How do guests experience their stay in Airbnb in Florence?
  2. What are the main topics in guest reviews?
  3. How best to predict the topic of a new review?

File Description

There are 5 notebooks available to answer the questions above. Each of the notebooks is exploratory in researching through the data and with the support of the machine learning models highlighted by the notebook title. Markdown cells have been used to guide you through the process at each individual step.

  • Data_Cleaning_Reviews_AirbnbFlorence.ipynb

  • Data_Cleaning_Listings_AirbnbFlorence.ipynb

  • Sentiment_Analysis_AirbnbFlorence.ipynb

  • Topic_Modeling_AirbnbFlorence.ipynb

  • Topic_Classification_Airbnb_Florence.ipynb

Results

The main findings of the analysis are discussed in the blog post "How To Get Useful Insights From Airbnb Reviews" available here.

Licensing, Authors, Acknowledgements

Open-source data from http://insideairbnb.com/get-the-data.html (data used compiled: 12 July, 2021)

About

For this project, I chose the Airbnb dataset of the city of Florence. Specifically, I attempted to answer the following questions using the most popular Natural Language Processing techniques applied to review data: 1. How do guests experience their stay in Airbnb in Florence? 2. What are the main topics in guest reviews? 3. How best to predict the topic of a new review?


Languages

Language:Jupyter Notebook 100.0%