urlcrawler.py is a Python script that performs a web crawl for a domain or domain list. This script finds all URLs under the domains.
git clone https://github.com/Mr0Wido/urlcrawler.py.git
cd urlcrawler.py
python3 urlcrawler.py
python crawler.py -d test.com
python crawler.py -d test.com -o urls.txt
python crawler.py -l domains.txt
Flags | Description | |
---|---|---|
-h | --help | Show this help message and exit. |
-d | --domain | The domain to crawl. Example: https://test.com |
-l | --list | File containing a list of domains to crawl. |
-o | --output | The output file where the found URLs will be saved. |
requests
BeautifulSoup4
This script tries to find all URLs under a specific domain. However, some URLs may be generated by JavaScript or other dynamic content and may not be found by this script. Also, this script sends a large number of requests and this can create high load on the target server. Therefore, it should only be used on your own sites or sites where you have explicit permission.