Clone of digininja's CeWL written in Golang.
- Crawl websites concurrently and extract words into a wordlist
- Should be faster as the original CeWL, as requests and parsing are performed concurrently.
- static binary available, so no dependencies required
- lower memory fooprint
Note: This repo is experimental. Cosider it pre-alpha. The api / cli can change at any time.
Note: Currently there are no tagged releases or pre-compiled binaries. This will change in the future.
To compile and and install goCeWL, Go needs to be installed on your system. If that's not yet the case, please follow the installation instructions here.
If you have Go installed, run go get github.com/shellhunter/gocewl
. This will download all dependencies and install the binary to $GOPATH/bin
.
Run gocewl --help
to display the commandline options.
gocewl is a commandline tool to generate custom wordlists by crawling webpages. It is based on CewL by digininja.
Usage:
gocewl URL [flags]
Flags:
-A, --allow stringArray Domains in scope for the crawler. Provide as comma sperated list.
-d, --depth int Maximum depth for crawling (default 2)
-h, --help help for gocewl
-k, --insecure Ignore self-signed certificates
--max-word int Maximum word length (default 15)
-c, --min-count int Minimum number of times that the word was found (default 1)
--min-word int Mininum word length (default 3)
-O, --offsite Allow the crawler to visit offsite domains
-p, --proxy string Proxy to use: http[s]://[user:pass@]proxy.example.com[:8080]
-q, --quiet No output, except for words
-t, --threads int Amount of threads for crawling (default 10)
-u, --url string URL to start crawling
--user-agent string Custom user agent (default "gocewl/0.1")
--version version for gocewl
-w, --write string filename to write the wordlist to. If no file is provided, print to stdout (default "wordlist.txt")
Crawl https://en.wikipedia.org with default parameters.
gocewl https://en.wikipedia.org
Crawl https://en.wikipedia.org with depth of 2, 10 threads and write the output to wiki.txt
gocewl -d 2 -t 10 -w wiki.txt https://en.wikipedia.org
- Set minimum word length (defaults to 5)
- Set crawling depth (defaults to 2)
- Allow offsite crawling
- Proxy support
- HTTP Basic / NTLM Auth support
- Include E-Mails
- Include metadata
- Headers
- User-agent
- Cookie support
- Sort wordlist by wordcount
- --top-words cli switch to only print top X words (by count)
- Performance optimizations
- Improved error handling
- Improved cli
- Performance improvements
- Changed sync.Map to regular map with a mutex
- Fixed a race consdition when counting requests and error
- Fixed display of statistics
- Initial release to github