Crawls web pages and prints any link it can find.
- fast html SAX-parser (powered by
golang.org/x/net/html
) - small (<1000 SLOC), idiomatic, 100% test covered codebase
- grabs most of useful resources urls (pics, videos, audios, etc...)
- found urls are streamed to stdout and guranteed to be unique
- scan depth (limited by starting host and path, by default - 0) can be configured
- can crawl
robots.txt
rules and sitemaps brute
mode - scan html comments for urls (this can lead to bogus results)- make use of
HTTP_PROXY
/HTTPS_PROXY
environment values
- binaries for Linux, macOS and Windows
crawley [flags] url
possible flags:
-brute
scan html comments
-delay duration
per-request delay (default 250ms)
-depth int
scan depth, set to -1 for unlimited
-help
this flags (and their defaults) description
-robots string
action for robots.txt: ignore/crawl/respect (default "ignore")
-silent
suppress info and error messages in stderr
-skip-ssl
skip ssl verification
-user-agent string
user-agent string
-version
show version
-workers int
number of workers