roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How do I access items once all item pipelines are finished?

adzay opened this issue · comments

commented

I am running unit tests and i want to get all items scraped into an array. I plan to show the results in a Vue componenent so I need to return the results from a Laravel controller.

I am getting phpUnit logs that the terms have been successfully crawled. However the below results produces a null result.

$customs = Roach::startSpider(CustomSpider::class);

//roach.INFO: Run starting [] []
//roach.INFO: Item scraped {"name":"xxx"} []
//roach.INFO: Item scraped {"name":"xxx2"} []


   foreach ($customs as $cus) {
            dd($cus);
        }

//foreach() argument must be of type array|object, null given

Please help I have read your docs but it talks about handling data within the generator (itemsPipeline), nothing about exporting results.

Thanks

This will be part of the upcoming 1.0 release. A new method Roach::collectSpider(...) will get added that behaves the exact same way as Roach::startSpider(...) except that it will return all scraped items after the run.

// $scrapedItems is an array<int, ItemInterface>
$scrapedItems = Roach::collectSpider(MySpider::class);

This is possible in the 1.0 release. Please check out this section of the docs for more information.