pizzadudez / FoodPandaScraper2

Splash free

0200

Setup:

credentials.py based on credentials_sample.py
scrapy crawl food

TODO:

add menu model
append scrapy to table names
move all products from options to the products table
- add flag 'is_menu_item'
  - ! determine if menu item based on existence of menu_category_id instead
- options will only contain prod_id and price
images table + pipeline:
- save locally with product id as name
skip vendor if crawled recently
combo menu items are products with options that are other menu items
- add vendor
- add toppings + options (products + product options)
- add the rest normally
- go thru all products that have a menu_cat.id and check for combos
  - use .join() in sqlalchemy to get all menu_item_products of a vendor
images retry with different widths
- try requests from predefined array of width
- Solved: added download delay => all images @ 5000px
add remaining vendor data
- vendor banner
images table (one products many images) -> figure out after s3 integration
requirements.txt
sepparate credentials file
skip vendor if recently crawled
vendor updated_at UTC timezone info
s3 path

Potential Issues:

city name issue with restaurant in Arad that's actually in Cluj;
- seems to be fixed for now on foodpanda's end but need a more reliable solution
- ? ignore vendor-data.city_id and generate our own id's for each city url
Handling chains (crawling chains doesn't redirect to local vendor)

About

Splash free

Languages

Language:Python 100.0%

Links

ProductDiscover

Data Powerby api.github.com. Remove your profile on the Giters? Go to settings.

Contact Site Admin: Giters.