roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run namespace and Request serialization

aaronflorey opened this issue · comments

Would you accept a PR that adds a namespace to a run?

My use case is that i want to write a RequestSchedulerInterface that runs in Redis so i can resume runs, but there's no way to make the run consistent across runs without passing through something from the spider. So my solution is to pass the Spider class name into the Run and then i can pass that to the scheduler.

Another thing that i need to support is serialization of requests which isn't possible currently since the callable is stored as a closure. So i propose we store the callable and convert it to a closure later, that way i can serialize the request for later use.

As i said in my other #102, i assume there's reasoning behind the way things are done currently so i won't make a PR until i get your go ahead to make these outlandish changes!

I've thought about adding some kind of namespace or identifier for a run before since that would also help in solving #36. It would definitely be a breaking change, however, since it would at least require changing the Run class to accept an additional parameter. This is something that I would want to implement myself, but I'm open for ideas on what the API could look like.

About serialization: storing the callable wouldn't actually solve the problem because Closure satisfies the callable type, too. So it would still be possible to pass a Closure to the $parseMethod parameter of the Request class. I think the best way to deal with this would be implementing the __sleep and __wakeup magic methods for the Request class so we can properly handle serialization. We could then use opis/serializable-closure or laravel/serializable-closure to deal with the parse callback.

Awesome, thanks for the reply. I'll look into implementing it!