Useful dead link checking
Most tools for finding broken or dead links check for an HTTP error code like 404.
When a resource is missing, instead of returning an HTTP 404 code, servers will sometimes return:
- an HTTP 200 response and an HTML page saying 'page not found' or similar
- an HTTP redirect to the front page of the site (or of another site) not the resource requested
- an infinite chain of redirects
- a completely different resource than the one requested
coroner is designed to detect dead links:
- including all of the above situations
- rapidly (tested on lists of 1000s of URLs gathered from real sites)
- without hammering remote URLs (per-host rate limiting)
- as part of an automated pipeline (takes input through shell pipes and can output JSON test results)
Install node and npm (for example, through nvm).
Then, run the following:
npm install -g coroner
To check one or more links
coroner http://test1.com http://test2.com
To check all the links within file containing a list of links
cat links.txt | coroner
To check all the links within a saved HTML file and return a list of failing URLs only
sed -n 's/.*href="\(h[^"]*\).*/\1/p' webpage.html | coroner -f
To check all the links within a live web page, skipping over internal links:
curl https://test.com | sed -n 's/.*href="\(h[^"]*\).*/\1/p' | coroner -s test.com
Options are:
-h, --help show this help message and exit
--filter, -f only show test failures (default: show full results)
--json, -j output results in JSON format (default: false)
--skip SKIP, -s SKIP skip links from the specified host
--timeout TIMEOUT, -t TIMEOUT
maximum time (ms) to allow remote host to respond
--cooldown COOLDOWN, -c COOLDOWN
minimum time (ms) between requests to a specific host