roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pass Context into Request Middleware

JakeOcean opened this issue · comments

Is there any way to Pass Context into Request Middleware or an ItemProcessor?

Within an Item Processor, if you need a bit of context from the Spider before you can save the Item to the Database, there seems to be no way to access any Meta data, or the Request/Response objects

Any reasons you can’t put that meta data on the item itself before yielding it?

Closing due to inactivity. Feel free to reopen if you have more information.

commented

Doing this will pass the context parameters passed external to the spider to the ItemProcessor

https://roach-php.dev/docs/spiders/#passing-additional-context-to-spiders

public function parse(Response $response): Generator
{
    $userAgent = $this->context['userAgent'];
    yield $this->item([
        'userAgent' => $userAgent, // This will be passed to ItemProcessor
        'url'       	   => $response->getUri(),
    ]);
}

https://roach-php.dev/docs/item-pipeline/#making-processors-configurable

Don't know if it's a good way, but it can be done.
It is unclear whether the built-in middleware can directly obtain the header information of external requests.