adsk2050 / xomato

Zomato scraping and analysis mini project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I have used mongo db atlas as cloud database. I have used beautifulsoup4 and requests to scrape the data.

I have scraped the data of top restaurants in Mumbai:

  • Restaurant details (name, description, address, geotag, cuisines, timings etc).
  • Dishes in the restaurant(name, description, price, offer, discount, tags etc).
  • All reviews of the restaurant (text, reviewer, # of likes and comments etc).

I have worked on Restaurant details and Dishes tables:

  • Data sanity checks, cleaning.
  • Deriving simple insights.
  • Using nltk to derive insights from name, description, tags etc.

I have divided the code in three notebooks:

  • Final_data_scraping_and_saving_to_mongodb_collection.ipynb
  • It contains all the code used to scrape the code from Zomato restaurant site
  • and save the data to mongodb.
  • Data analysis.ipynb
  • It contains data sanity checking, cleaning, creating new columns, analysis and all the plots.
  • Rough work for xomato.ipynb - It contains the experimentation code.

I have not used Tableau/PowerBI before and I didn't want to learn it from scratch as it would itself take a weekend at least. So I have shown the plots in the notebooks only, using Matplotlib.

This assignment was my first experience with web scraping to collect big data. I have done a couple of web scraping projects before.

I have really found this project very interesting. I think I will work more on it to make it a complete portfolio level project. Especially the review data analysis.

About

Zomato scraping and analysis mini project


Languages

Language:Jupyter Notebook 100.0%