Crawler

A crawler to get all a tags of a page. It only crawls the targets domain. If you crawl example.com which has a link to whatever.com, whatever.com will not be crawled.

Instantiation

$crawl = new Crawler();

Set target to crawl

$crawl->crawl('https://example.com');

Get crawled links

Gets all the crawled links of that domain as a one dimensional array.

$crawl->getCrawledLinks();

Defaults

Scheme allowed

http
https

Extensions allowed

html
htm

Options

You can set and get allowed schemes and file extensions.

Setting allowed file extensions

$crawl->allowed('set', 'allowedFiles', '.pdf', '.png');

Removing allowed file extensions

$crawl->allowed('remove', 'allowedFiles', '.pdf', '.png');

Removing allowed schemes

$crawl->allowed('remove', 'allowedSchemes', 'http');

Getting allowed file extensions

$crawl->allowed('get', 'allowedFiles');

Getting allowed schemes

$crawl->allowed('get', 'allowedSchemes');

About

A crawler to get all links of a specific page.

php php7 crawler oop simple crawling

GNU General Public License v3.0

Languages

Language:PHP 100.0%