Python scrawler for zomato. This spider can collect restaurant information near Melbourne, Australia (including restaurant name, thumbnail, rating, info url and geo location). Collecting geo locations are the main purpose of this spider. You can also extract other data since the raw data list will be kept in a single json file. 5 pages of restaurants in carlton will be saved in JSON files, and then the detailed restaurant info will be saved to a single CSV file.
- You should identify
DISTRICT
variable inconfig.py
. In this demo example, modify it to be carlton. - You should identify
COOKIE
variable inconfig.py
. Cookie can be found by opening browser and visit zomato website. The expiration time is unknown, but it should be OK. - You can leave
ROOT_URL
andREQUEST_URL
unchanged. - By default, spider crawls 5 pages of restaurants. You can modify this variable to collect more. However,
SUBPAGE_REQUEST_DELAY
which defines delay time should be set to avoid being blocked by zomato. - After that, run
crawl.py
.
Language: Python
Version: 3.6+
Modules: requests and bs4
If you would like to crawl other district data:
- create a file, named
DISTRICT_source.json
, where DISTRICT is the constant inconfig.py
. This can be done by browsing to any district of zomato. Then, just open F12 to see the initial payload (This require basic skills of package capture). Copy source payload to this json file. - Follow steps in the "To start with" section. Run
crawl.py
- Concurrent crawling: asyncio, Scrapy.
- Database: pymongo.
- Other cities support (possibly not).
If you find any bugs or have any suggestions for me, welcome to create an issue or contact me via WeChat: Or via email: 913248383@qq.com
Use this spider under any legal policy and use crawled data for visualisation or machine learning purpose ONLY. Anyone using this code to make business profit should be responsible for any prosecutions that may incur in the future.