- This script extracts all the URLs from a text file containing a list of websites, and saves them in JSON format.
- Handles missing schemas and fixes relative URLs to ensure accurate results.
- Uses multithreading to concurrently process multiple websites, so it's fast!
- Install the required modules
$ pip install aiohttp beautifulsoup4 fake_useragent
- Download the script
$ curl -OL https://raw.githubusercontent.com/CodeDotJS/urlist/master/extractor.py
- Run
$ python extractor.py
Note: If you need to save all the links present in the JSON to a text file, you can download
$ curl -OL https://raw.githubusercontent.com/CodeDotJS/urlist/master/generateTxt.py
I needed a tool to generate thousands of active URLs and dump them as JSON, so I built one.
MIT