djc / couchdb-python

Python library for working with CouchDB

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bulk upload doesn't create doc if needed

verganis opened this issue · comments

I am using the Database.update() method to write a list of dictionaries to a Couchdb.
These docs are being copied from a source db and put in a destination db.
Let's say I have the following docs with their keys in my list
IDKEY-1
IDKEY-2

If in the destination db there are two existing docs with such keys, the update works fine.
If, for example, there is a doc with id "IDKEY-1" but no document with id "IDKEY-2" the first document is updated correctly while the second will not be written in the destination db. The return value of the update method won't give any clue as it's returning a tuple of (True,IDKEY-2, rev_number) even for the second document.

When I check on the destination db there is no document with key "IDKEY-2".

Can you paste some example code to clarify what you're trying to do here?

Yes, of course. I'm trying to pass documents from a source db to a destination db.
The documents are read from the source doc, elaborated (I simplified that in a function called "do something") and then written into the destination db. Since I have done this process many times it happens that "Document A" is present in both db and when I do this the bulk update works fine and Document A is updated in destination db. If I have "Document B" in source db but not in the destination db before doing the copy then Document B is not copied at all by the script to the destination document and the return value of Database.update() method is still True.

    def chunks(self,my_list, n):
        """Yield successive n-sized chunks from l."""
        for i in xrange(0, len(my_list), n):
            yield my_list[i:i+n]

    def my_test(self):
        my_keys=['IDKEY-1','IDKEY-2']
        doc_list=[]
        for key in my_keys:
            source_doc = source_db.get(key)
            destination_document = do_something(source_doc)
            old_destination_doc = destination_db.get(key, None)
            if old_destination_doc:
                revision = old_destination_doc.get('_rev',None)
                if revision:
                    destination_document['_rev']=revision

            doc_list.append(destination_document)
        for idx,bulk in enumerate(list(self.chunks(doc_list, 80))):
            return_values = destination_db.update(bulk)
            for r in return_values:
                (success, docid, rev_or_exc) = r
                self.logger.debug("Write return values:{},{},{}".format(success,docid,rev_or_exc))
                if success is False:
                    self.logger.critical("Document write failure! id:{} Reason:'{}'".format(docid, rev_or_exc))
                    exit()

If more info is needed please ask.
Thanks for the support

More generically I found out that when I bulk update docs the return value "success" is always True but sometimes the doc is not updated.

Sorry about the slow response, I've been very busy.

I looked at your code and don't see any obvious problems. However, it feels given your explanation that the problem is maybe somewhere else. One obvious question is, what is the return value for your do_something() in the case there is no source_doc? In your code here, a destination_document is returned. However, the source_doc pased to do_something() will be None in this case. That would mean do_something() has no access to the key for this particular document. If it returns some document that does not have _id set in it, it will be saved under a different key name in the destination database (but saving it will still be successful).

Hello, thanks for the support. After a chat with Couchdb support technician on IRC I found out that there is a known bug in the version of couchdb i was using. the problem is not with the python library.