usernam3 / shopify-app-store-scraper

Crawler behind the Shopify App Marketplace dataset

Home Page:https://www.kaggle.com/usernam3/shopify-app-store

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shopify App Store scraper

About

Here you can find the code which scrapes and saves data from the Shopify App Store.

The scraper is used to collect Shopify app store dataset on Kaggle and includes these files:

  • apps
  • apps_categories
  • categories
  • key_benefits
  • pricing_plan_features
  • pricing_plans
  • reviews

While the dataset published on Kaggle is regularly updated, this repository allows keeping the local copy up to date independently of the released version.

Detailed dataset description can be found here.

How to use it

Docker (recommended)

Authenticate to GitHub Container Registry (if not already)

docker login ghcr.io -u USERNAME -p TOKEN

Pull container

docker pull ghcr.io/usernam3/shopify-app-store-scraper

Run container

docker run -v `pwd`/output/:/app/output/ ghcr.io/usernam3/shopify-app-store-scraper

After container finished the execution check the output folder (in current directory)

ls -la output/

Python

Install requirements

pip install -r requirements.txt

Run scraper

scrapy crawl app_store

After container finished the execution check the output folder (in current directory)

ls -la output/

Please don't hesitate to open issues or PRs at any time if you need help with anything.

About

Crawler behind the Shopify App Marketplace dataset

https://www.kaggle.com/usernam3/shopify-app-store

License:MIT License


Languages

Language:Python 99.2%Language:Dockerfile 0.8%