roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testing how a spider scrapes a given HTML file

seb-jones opened this issue · comments

Hello there,

Just a question. Is there a simple way to feature test a spider by giving it some HTML and inspecting what it returns, e.g. making assertions against what would be returned by collectSpider.

Many thanks

Seb

I'm afraid there isn't nice way to do this at the moment but it's something I will probably add in the future.

Cool cool, thanks for the response :)

For what it's worth, I've managed to implement a fairly simple, albeit inelegant, way to do these kind of tests in the meantime. It works by firing up a PHP dev server and pointing the spider to that URL by overriding the startUrls. Thought I'd share the code here in case it was useful to anyone:

$serverProcess = null;

beforeAll(function () {
    global $serverProcess;
    $serverProcess = proc_open('cd resources/html && php -S localhost:8123', [], $pipes);
});

it('scrapes an html page', function () {
    $scrapedItems = Roach::collectSpider(
        MySpider::class,
        new Overrides(startUrls: ['http://localhost:8123']),
    );

    // do some assertions on $scrapedItems
});

afterAll(function () {
    global $serverProcess;
    proc_terminate($serverProcess);
});

The above assumes that there is an index.html file in resources/html.

I imagine there's probably a nicer way to do it, but this seems to be working right now.

FYI, I've already started working on testing helpers for this. https://twitter.com/warsh33p/status/1543150150205538304

Shouldn't take too much longer.

Nice! I look forward to trying them out.