dev ops engineer‘s url fetcher
- the main.py read urls from open(sys.argv[1])
- then fetch the urls use gevent, BeautifulSoup will grep all links in source html(a tag only)
- print each url
require python 2.7 && pip installed. For example
sudo apt install python-2.7 python-pip
git clone https://github.com/fiht/URLGetter && cd URLGetter
sudo -H pip install requirements.txt
ps: vitrualenv maybe a better choice.
python main.py urls.txt
- print unique url (use python-bloomfilter)
- show process bar