process crash when using pause() resume() for large database

Question

process crash when using pause() resume() for large database

jlc467 opened this issue 8 years ago · comments

On a db of ~1.1 million records, i call follow on it like this:

follow({db:"https://example.iriscouch.com/bigdatabase", include_docs:true}, function(error, change) {
    var feed = this
    feed.pause()
    setTimeout(function() { feed.resume() }, 500)
})

Memory climbs rapidly until a GC error crashes process at around 1.5GB of memory usage.

Using node debugger, heap snapshot reveals json strings of changes (which is the entire doc since i'm passing include_docs: true) is what is accounting for the high mem usage and eventual crash of process.

Is this what is known as backpressure ? If i do away with the pause() resume() the issue goes away, but i need to be able to pause() resume() to do some async stuff with each change in sequence.

Just wondering if anyone can explain my issue and/or potential solutions. Thanks

Jordan · Answer 1 · Fri Apr 22 2016 03:00:51 GMT+0800 (China Standard Time)

Just curious (I may use this for something)... did you ever figure this out?

did you try adding explicit returns in the feed and timeout functions?

I'm leaning towards writing my own promise based, lib that can handle paging with document limits or one that is still event based but has a throttling option..

Jordan · Answer 2 · Fri Apr 22 2016 03:29:22 GMT+0800 (China Standard Time)

...or adding query_params.limit when using a query_params.feed of 'longpoll'?

Jordan · Answer 3 · Fri Apr 22 2016 03:34:16 GMT+0800 (China Standard Time)

my guess is that by the time you've paused, the entire change feed is already streaming to the app's memory but not firing the callback for each one yet. I could be wrong, but it seems like it grabs all changes since since regardless of how many have occurred.

I'd be curious to know if the above options help you out.

John Collins · Answer 4 · Sat Apr 23 2016 03:20:54 GMT+0800 (China Standard Time)

@jordancardwell we ended using nano.db.changes .

Something like:

Grab a 1000 at a time with limit while passing the last sequence # processed (body.last_seq) to since.
loop results, do promise async stuff
Repeat forever.

We store the last sequence # processed so when process dies, we can pick up where we left off.

Works perfect so don't see us attempting follow again, though appreciate the ideas.

Would be interested in hearing any related updates with your approach!

Mathias Chouet · Answer 5 · Tue Apr 24 2018 23:32:03 GMT+0800 (China Standard Time)

Any news about this issue ? This is quite annoying for a "Very stable" changes follower :-/
I guess fixing it wouldn't be so hard for original developers − anyone here ?
Many thanks :)

FWIW : my DB has 24 million docs, I experienced the exact same issue as the OP (confirmed by node debugger). Eventually, launching nodejs with --max_old_space_size=13000 (13 GB RAM) did the trick: the program does not crash anymore. This is a gigantic amount of RAM for almost nothing, but it works.