bda-research / node-crawler

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Home Page:http://node-crawler.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

drain event called multiple times

villesau opened this issue · comments

Seems that with current version of Bottleneck drain event might be called multiple times. This happens at least if some of the requests fail.

Other case when drain is called more than once is when using grouping.

I believe you do not queue the new task before done called in callback. You should check this in your code, but if not you could paste the code in your convenience.
What do you mean by 'grouping'?

This probably happens because limited queueSize can go negative. This would be fixed by updating to latest Bottleneck: https://github.com/SGrondin/bottleneck I have a fork which have it updated and this issue is fixed, but it also gets rid of unnecessary features that I don't need: https://github.com/villesau/node-crawler/tree/update-bottleneck

By grouping I mean limiter prop. Every time any limiter empties, drain is called.

Thank you for your reply,

  1. we customized bottleneck package named bottleneckP which has priority included
  2. we have thousands of scripts written base on crawler and this won't happen if code is written in the right way
  3. better to post more details will help to find out the root cause. e.g. in what situation the unfinishedClients will go negative. As you can see we use it a lot in daily work, so it's good for me to know a corner case in which probably we'll have trouble.

@mike442144 the current version of Bottleneck also supports priority.

queueSize can apparently go negative when the system retries failed calls. Also if multiple limiters are used, drain seems to be called for each of them separately. By updating the library I couldn't see queueSize to go negative anymore, so the issue has to be in bottleneckP, except with multiple limiters.

Actually I don't think retry failed task will cause negative queueSize, each time we get error done will be called and queueSize - 1, then enqueue the task as usual, that means queueSize +1.
Which version are you using? I suggest you to adopt the latest version to test again and post your testing result here.

close due to inactive, will reopen if any update