mongo_iterator

A more performant way to iterate over a big (unindexed) MongoDB database

operation

based on the timestamp integrated in the standard _id (source) we split the data in different batches (1 batch per day)

all these batches are put in a queue
these batches are handled by different python-workers in parallel (you need to set the number of workers by MAX_PROCESSES)

currently only implementation to export a big set of tweets stored in a MongoDB database to different text-files

A more performant way to iterate over a big (unindexed) MongoDB database

Language:Python 100.0%