danhje / dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dead Link Crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

Installation

git clone https://github.com/danhje/dead-link-crawler.git
cd dead-link-crawler
pipenv install

Usage

To start Python from within the virtual environment:

pipenv run python

To start the crawl and print the results:

from deadLinkCrawler import DeadLinkCrawler

crawler = DeadLinkCrawler()
crawler.startCrawl('http://danielhjertholm.me/prosjekter.htm', verbose=True)
crawler.printDeadLinks()
checkedLinks = crawler.checkedLinks
deadLinks = list(crawler.deadLinks)

About

An efficient, asynchronous crawler that identifies broken links on a given domain.

License:MIT License


Languages

Language:Python 100.0%