spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

Home Page:https://freek.dev/308-building-a-crawler-in-php

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How can I get the response body from crawled?

vFire opened this issue · comments

commented

public function crawled(UriInterface $url, ResponseInterface $response, ?UriInterface $foundOnUrl = null)
{
$urls = Cache::get($this->successUrlsCacheName, []);
$urls[] = [
'url' => urldecode($url),
'found_on_url' => urldecode($foundOnUrl),
'code' => $response->getStatusCode()
];
Cache::put($this->successUrlsCacheName, $urls);
$body = (string) $response->getBody();
Log::debug('Crawler: ' . $url . ' has be crawled.', ['body' => serialize($body)]);
}

I just get nothing from it, $response->getStatusCode() could work properly, but getBody can't.

Here is my full script:

    Crawler::create([RequestOptions::ALLOW_REDIRECTS => true])
    ->setMaximumResponseSize(1024 * 1024 * 1)
    ->setUserAgent(UserAgentController::getRandomAgent('desktop', 'Windows Browsers'))
    ->setCrawlProfile(new \Spatie\Crawler\CrawlProfiles\CrawlAllUrls($this->url))
    ->setCrawlObserver(new \Vfire\CrawlerTool\Observers\CrawlHelper($this->baseCacheKey))
    ->setTotalCrawlLimit(5000)
    ->setCurrentCrawlLimit(100)
    ->setParseableMimeTypes(['text/html', 'text/plain'])
    ->setCrawlQueue(new \Vfire\CrawlerTool\Queues\CacheCrawlQueue($this->baseCacheKey, 60*60*24))
    ->startCrawling($this->url);