medialab / sandcrawler

sandcrawler.js - the server-side scraping companion.

Home Page:http://medialab.github.io/sandcrawler/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

body.match is undefined

kevinrademan opened this issue · comments

commented

I got the body.match is undefined error while using the crawler. One of the links on the site I was crawling redirected to https://www.facebook.com/unsupportedbrowser

The static engine then throws an error on this line
https://github.com/medialab/sandcrawler/blob/master/src/engines/static.js#L119

Changing that line to fixes the problem

var m = (new Buffer(body)).toString("utf8").match(/<meta.*?charset=([^"']+)/);

Would you recommend that for a fix?

What does the body variable contains in your case? Is it undefined, null?

Hey, pitching in here as I was facing the same issue. The body variable contains a Buffer-object while a String is probably expected. Looking at the code I guess that the problem is that no encoding is set for the request. If I set an encoding using spider.config({encoding: "utf8"), it did work fine.

So this is probably related to #177 to some degree ;-)