How can I get the response body from crawled?
vFire opened this issue · comments
public function crawled(UriInterface $url, ResponseInterface $response, ?UriInterface $foundOnUrl = null)
{
$urls = Cache::get($this->successUrlsCacheName, []);
$urls[] = [
'url' => urldecode($url),
'found_on_url' => urldecode($foundOnUrl),
'code' => $response->getStatusCode()
];
Cache::put($this->successUrlsCacheName, $urls);
$body = (string) $response->getBody();
Log::debug('Crawler: ' . $url . ' has be crawled.', ['body' => serialize($body)]);
}
I just get nothing from it, $response->getStatusCode() could work properly, but getBody can't.
Here is my full script:
Crawler::create([RequestOptions::ALLOW_REDIRECTS => true])
->setMaximumResponseSize(1024 * 1024 * 1)
->setUserAgent(UserAgentController::getRandomAgent('desktop', 'Windows Browsers'))
->setCrawlProfile(new \Spatie\Crawler\CrawlProfiles\CrawlAllUrls($this->url))
->setCrawlObserver(new \Vfire\CrawlerTool\Observers\CrawlHelper($this->baseCacheKey))
->setTotalCrawlLimit(5000)
->setCurrentCrawlLimit(100)
->setParseableMimeTypes(['text/html', 'text/plain'])
->setCrawlQueue(new \Vfire\CrawlerTool\Queues\CacheCrawlQueue($this->baseCacheKey, 60*60*24))
->startCrawling($this->url);