roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to pass extra header for blocked sites?

MaximKushnir opened this issue · comments

Hey first of all, thanks for this excellent package, im newbie of php and laravel. Some sites blocked crawler so this package returning
The current node list is empty. code. So probably i should pass the headers information like this get request from Mozilla information chrome etc.

I use this:
public array $downloaderMiddleware = [ [ RoachPHP\Downloader\Middleware\UserAgentMiddleware::class, ['userAgent' => 'Mozilla/5.0 (compatible; RoachPHP/0.1.0)'], ] ];

But its not work. Output: Illuminate
App\Spiders\RoachPHP\Downloader\Middleware\UserAgentMiddleware

How can i do that with this?

Just to be clear, I won't help with circumventing anti-scraping measures as that quickly gets into dubious territory.

That being said, it's not quite clear to me if you get an error when trying to use the UserAgentMiddleware or not. It looks like your code snippet should work.

Closing due to inactivity. Feel free to reopen if you have more information.