bda-research / node-crawler

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Home Page:http://node-crawler.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to skip URL with ECONNRESET or ETIMEDOUT errors

harrysayers opened this issue · comments

I'm crawling hundreds of thousands of XML/RSS feeds that are being streamed from a local file using fs.createReadStream... I get to about 16000 links and the same links stop node with the following errors :- ECONNRESET or ETIMEDOUT.... What I want to do is be able to skip the url and continue on with the next links if it has timed out or can't get a secure connection to it etc... How would I do this?

Basic settings for crawler - crawler is called in another file and passed link via "show.FeedLink"

async function crawlLink(show){
    const crawler = new Crawler({
      maxConnections: 8,
      timeout: 15000,
      retries: 1,
      retryTimeout: 10000,
      callback: async function(error, res, done){
            if(error){
                console.log(error);
                reject(done())
            }else{                          
           ...... Logic saving link to DB......
        }
      done();
    });

    crawler.queue({uri: show.FeedLink});
}

Sorry that I'm not familiar with async function so anybody knows it may help.

Did you find a way to do this? This error shouldnt stop the crawler especially when you're crawling thousands of sites.

I suggest not to use Promise or Async.