MichaelFriedberg / crawler

Crawl your own website with various clients for SEO and indexing purposes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status Scrutinizer Code Quality Code Coverage Total Downloads Latest Stable Version Latest Unstable Version SensioLabs Insight License

MediaMonks Crawler

This tool allows you to easily crawl a website and get a DOM object for every url that was found. We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client. The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.

Highlights

  • Ships with Prerender & Prerender.io clients, uses Goutte by default
  • Supports any Symfony BrowserKit client
  • Supports both whitelisting and blacklisting of urls
  • Supports url normalization which allow you to prevent duplicates based on minor url differences
  • Implements the PSR-3 Logger Interface

Documentation

Documentation and examples can be found in the /doc folder.

System Requirements

You need:

  • PHP >= 5.5.0

To use the library.

Install

Install this package by using Composer.

$ composer require mediamonks/crawler

Security

If you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.

License

The MIT License (MIT). Please see License File for more information.

About

Crawl your own website with various clients for SEO and indexing purposes.

License:MIT License


Languages

Language:PHP 100.0%