jeroen / mongolite

Fast and Simple MongoDB Client for R

Home Page:https://jeroen.github.io/mongolite/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Iterator causes recursive gc invocation when called from RScript

koheiw opened this issue · comments

mongolite crashes on my Linux when executed using RScript (it works when I run interactively in the R console).

*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation

Traceback:
 1: mongo_cursor_next_page(cur, size = 1)
 2: it$one()
 3: eval(ei, envir)
 4: eval(ei, envir)
 5: withVisible(eval(ei, envir))
 6: source("export.R", local = TRUE, echo = FALSE)

The error is triggered by the iterator. I am using iterator to get a list (#236).

      query <- sprintf('{
          "type": "%s",
          "date": {"$gte": {"$date": "%sT00:00:00Z"}, 
                   "$lte": {"$date": "%sT23:59:59Z"}}
      }', type, from, to)
      res <- con$find(query, fields = '{"guid": 1}')
      
      lis <- rep(list(NULL), nrow(res))
      names(lis) <- res[["_id"]]
      for (oid in names(lis)) { 
          it <- con$iterate(sprintf('{"_id": {"$oid": "%s"}}', oid),
                            fields = '{"date": 1, "cik": 1, "text": 1, "section": 1}')
          doc <- it$one() # ERROR 
          if (!is.null(doc))
              lis[[oid]] <- unlist_mongo(doc)
      }
Rscript -e "packageVersion('mongolite')"
[1] ‘2.7.3’

I noticed that this happens when I call mongo() within future.apply::future_lapply(). This might be an issue in future.apply's parallelization infrastructure (e.g. dead child processes) instead of mongolite.

P.S. I want to know the best practice in establishing multiple connections to MongoDB from R.

You don't need to do anything special to establish multiple connections, the driver is specifically designed to handle this. You can keep multiple database connections by calling mongo() several times, and let the driver handle the pooling.

However I don't think it is a good idea to do multiple database queries at the same time. But if you really want to do this, you probably need to make sure that you create and close the connection in the worker. Copying connections from the parent to the worker is probably a bad idea (perhaps that was the original cause of your troubles).