roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there any way of accessing the first item inside a processor or an extension?

xavi-ortega opened this issue · comments

Description
I'm looking for a way of accessing the first item inside a Processor or inside an Extension.

It will bring me the possibility of initializing a CSV file with the item props as header, and this would only be needed for the first item.

Proposed solution
If it's inside an Extension, I'd be looking for a FirstItem Event, to which I can subscribe and access the data before it's processed in the pipeline.

Considered alternatives
If it's inside a Processor, I'd be looking for an $item->isFirst() that would be added directly into the ItemInterface, or inside the ItemProcessorInterface a firstItem() method.

Additional context
Solution 1:
image

Solution 2:
image

Solution 3:
image

I'm not a big fan of adding a specific event for the first scraped item of a run.

In your particular case, why not add a field on your extension or processor where you can keep track if the processor has already been called or not?

final class CSVExporter implements ItemProcessorInterface
{
    use Configurable;

    private int $itemsProcessed = 0;

    public function processItem(ItemInterface $item): ItemInterface
    {
        if ($this->itemsProcessed === 0) {
            $this->createCSVFile();
        }

        $this->addRowToCSVFile($item);

        $this->itemsProcessed++;
    }

    private function createCSVFile(): void
    {
        // Create file with headers...
    }

    private function addRowToCSVFile(ItemInterface $item): void
    {
        // Add row to file...
    }
}

Fair enough! Thanks for replying 😄