crawler crawlers crawling crawling-python

urlcrawler.py

urlcrawler.py is a Python script that performs a web crawl for a domain or domain list. This script finds all URLs under the domains.

Installation

git clone https://github.com/Mr0Wido/urlcrawler.py.git
cd urlcrawler.py
python3 urlcrawler.py

Usage

python crawler.py -d test.com
python crawler.py -d test.com -o urls.txt
python crawler.py -l domains.txt

Options

Flags		Description
-h	--help	Show this help message and exit.
-d	--domain	The domain to crawl. Example: https://test.com
-l	--list	File containing a list of domains to crawl.
-o	--output	The output file where the found URLs will be saved.

Requirments

requests
BeautifulSoup4

Notes

This script tries to find all URLs under a specific domain. However, some URLs may be generated by JavaScript or other dynamic content and may not be found by this script. Also, this script sends a large number of requests and this can create high load on the target server. Therefore, it should only be used on your own sites or sites where you have explicit permission.

About

urlcrawler.py is a Python script that performs a web crawl for a spesific domain or domains list. This script finds all URLs under the domains.

crawler crawlers crawling crawling-python

Languages

Language:Python 100.0%