farukalamai / yelp-scraper-scrapy-python

Yelp Restaurant data scraping using python, scrapy spider

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yelp Restaurant data scraping using python, scrapy spider

Top-10-Best-Restaurants-in-San-Francisco-CA-July-2023-Yelp

Deployment

1. Clone Repository

  git clone https://github.com/farukalampro/yelp-webscraper-using-scrapy-python.git
  cd yelp-webscraper-using-scrapy-python

2. Create Virtual Environment

  python -m venv env
  • For Windows:
  .\env\Scripts\activate
  • For macOS/Linux:
  source env/bin/activate

3. To install required packages

  pip install -r requirements.txt

4. Input your own link from yelp.com

  • Go to the data.py file. Insert link from Yelp
  • I have added one link in data.py as a sample. You can insert as many links as you want.
      start_urls = [
        # This is the sample URL
        # Here you have to put your own search link
        'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA' 
    ]

5. Run the command in the terminal

  scrapy crawl data -o sample_file.csv
  • you can download the data in any format. I have given the format below
  scrapy crawl "spider name" -o file_name.csv/json/xml
  • Here we have scraped some restaurant data which is in the Sample File folder

Important Note

  • As Yelp is continuously updating its website, so make sure you are updating xpath