BataBoom / Sherdog-Event-Scraper

Goutte, a simple PHP Web Scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

Usage

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser):

Make requests with the request() method:

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout:

Click on links:

Extract data:

Submit forms:

More Information

Read the documentation of the BrowserKit, DomCrawler, and HttpClient Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following Symfony Components: BrowserKit, CssSelector, DomCrawler, and HttpClient.

License

Goutte is licensed under the MIT license.

About

Goutte, a simple PHP Web Scraper

License:MIT License


Languages

Language:PHP 100.0%