iriscouch / follow

Very stable, very reliable, NodeJS CouchDB _changes follower

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: Cannot find wait timer

isaacs opened this issue · comments

Occasionally this happens:

Error: Cannot find wait timer
    at Feed.got_activity (/home/node/node_modules/npm-fullfat-registry/node_mo
dules/follow/lib/feed.js:355:21)
    at Feed.on_couch_data (/home/node/node_modules/npm-fullfat-registry/node_m
odules/follow/lib/feed.js:412:8)
    at Changes.handle_confirmed_req_event (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/feed.js:308:30)
    at Changes.EventEmitter.emit (events.js:95:17)
    at Changes.emit_changes (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:223:12)
    at Changes.write_continuous (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:176:8)
    at Changes.write (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:124:17)
    at Request.ondata (stream.js:51:26)
    at Request.EventEmitter.emit (events.js:95:17)
    at IncomingMessage.<anonymous> (/home/node/node_modules/npm-fullfat registry/node_modules/follow/node_modules/request/request.js:840:12)

??

We are seeing this too while trying to replicate the npm registry.

Why is this a fatal error?
https://github.com/iriscouch/follow/blob/master/lib/feed.js#L354-L355

  if(! self.pending.wait_timer)
    return self.die(new Error('Cannot find wait timer'))

  clearTimeout(self.pending.wait_timer)
  self.pending.wait_timer = null

Since you are clearing the timeout directly after checking if it's there, shouldn't this be:

  if(! self.pending.wait_timer)
    clearTimeout(self.pending.wait_timer)

  self.pending.wait_timer = null

Doesn't make a lot of sense to me to die if it's not there just before you are going to clear it anyway.

@davglass im guessing the assumption is that there is something wrong if that timer has already been cleared or does not exist.

Regardless this module will be refactored as a wrapper around my changes-strem module once i get some better test coverage. You can checkout the current wip in the refactor branch. This will solve some of the inconsistencies.

We still see this pretty often in our production followers. It doesn't happen often enough to throw the worker into a tailspin, and we use seq-file to restart right where we left off. But still, kinda annoying.

ok so the root of this problem is actually due to a new request being created while the feed is paused because the wait_timer expires and triggers an on_timeout() -> retry() in terms of function calls. This causes the resume to cause this particular failure in got_activity().

@davglass removing that line does seem reasonable as a stop gap so I will do some testing and publish a new version.

this is fixed in v0.11.1. Removing that actually worked.

👍