Fematich / mongo_iterator

A more performant way to iterate over a big (unindexed) MongoDB database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mongo_iterator

A more performant way to iterate over a big (unindexed) MongoDB database

operation

based on the timestamp integrated in the standard _id (source) we split the data in different batches (1 batch per day)

  • all these batches are put in a queue
  • these batches are handled by different python-workers in parallel (you need to set the number of workers by MAX_PROCESSES)

caveat

currently only implementation to export a big set of tweets stored in a MongoDB database to different text-files

About

A more performant way to iterate over a big (unindexed) MongoDB database


Languages

Language:Python 100.0%