webrecorder / browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container

Home Page:https://crawler.docs.browsertrix.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dry Run Mode

ikreymer opened this issue · comments

A 'dry run' mode (is that the best name?) can be used to run a crawl without storing any archive data. It can be used to examine the scope of crawl via logs / saved state, or, to delegate handling via a remote proxy, when used in conjunction with external proxies (see #587). The dry run mode should still fetch everything + run behaviors, but not write any local data.
Text extraction and screenshots should also be skipped.