`scrapy-playwright` sample project for Scrapy Cloud

Trying scrapy-playwright on Zyte Scrapy Cloud.

Dockerfile

A custom Docker image is provided in order to install the system dependencies needed for the headless browsers.

Settings

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

_browsers = {
    "chromium": "/ms-playwright/chromium/chrome-linux/chrome",
    "firefox": "/ms-playwright/firefox/firefox/firefox",
    "webkit": "/ms-playwright/webkit/pw_run.sh",
}
PLAYWRIGHT_BROWSER_TYPE = "chromium"
PLAYWRIGHT_LAUNCH_OPTIONS = {
    "executablePath": _browsers[PLAYWRIGHT_BROWSER_TYPE],
    "timeout": 10000,
}

TWISTED_REACTOR: scrapy-playwright will only function with the asyncio-based Twisted reactor
DOWNLOAD_HANDLERS: tells Scrapy to use the library's download handler to process requests
PLAYWRIGHT_LAUNCH_OPTIONS: the Docker image will be executed by a non-root user, and hence the path to the browser executable needs to be set explicitly.

Build and deploy

Make sure you have shub installed
Replace the project id (project: <project-id>) in the scrapinghub.yml file with your own project id
Run shub image upload
Run shub schedule headers

For more information, check out the full documentation on how to build and deploy Docker images to Scrapy Cloud.

denny64 / scrapy-playwright-cloud-example

`scrapy-playwright` sample project for Scrapy Cloud

Dockerfile

Settings

Build and deploy

About

Languages

scrapy-playwright sample project for Scrapy Cloud

Dockerfile

Settings

Build and deploy

About

Languages

`scrapy-playwright` sample project for Scrapy Cloud