Bug when used with multiprocessing
kevinjqiu opened this issue · comments
There appears to be a race condition and the bug exists in both Python 2.x and Python 3.x, although they're manifested differently.
Minimum code to reproduce the bug:
import couchdb
import multiprocessing
import multiprocessing.pool
server = couchdb.Server('http://COUCHDB_HOST:5984/')
try:
database = server.create('test')
except:
server.delete('test')
database = server.create('test')
database.save({'_id': '1', 'type': 'dog', 'name': 'chase'})
database.save({'_id': '2', 'type': 'dog', 'name': 'rubble'})
database.save({'_id': '3', 'type': 'cat', 'name': 'kali'})
def query_id(id):
return dict(database[id])
def main():
pool = multiprocessing.pool.Pool(3)
docs = pool.map(query_id, ['1', '2', '3'])
print(docs)
if __name__ == '__main__':
main()
Observation 1:
When run on Python 2.x, the following error is encountered:
$ python bug.py
Traceback (most recent call last):
File "bug.py", line 54, in <module>
main()
File "bug.py", line 46, in main
docs = pool.map(query_id, ['1', '2', '3'])
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
TypeError: 'ResponseBody' object is not iterable
Observation 2:
When run on Python 3.x, the execution hangs, and when you 'Ctrl+C' to terminate the program, the following stack trace is printed:
[ ... ]
headers=headers, **params)
File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/home/kevin/src/couchdb-python/couchdb/http.py", line 593, in _request
credentials=self.credentials)
File "/usr/lib/python3.6/http/client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/kevin/src/couchdb-python/couchdb/http.py", line 402, in request
data = resp.read()
File "/usr/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.6/http/client.py", line 462, in read
s = self._safe_read(self.length)
File "/usr/lib/python3.6/http/client.py", line 612, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
KeyboardInterrupt
Observation 3
If I change the pool size to 1 (essentially serialize the GET operations), the bug does not exist. Same happens when I try to debug it with visual studio code (whose debugger practically blocks the execution of other processes), the code runs without issue.
Observation 4
If I run a proxy server in front of couchdb (e.g., haproxy), the code runs without issue.
Have you thought about the possibility that there is a CouchDB bug, rather than a bug in CouchDB-Python? In particular, I think observation 4 (thanks for the detailed report!) suggests that the bug might not be in CouchDB-Python.
My other thought is that this might have to do with the connection pooling we're doing in couchdb.http
.
One question I have is, when you run this test case for 100 times (or 10), does it fail every time? My expectation would be for it to be intermittent.
Hi @djc
Have you thought about the possibility that there is a CouchDB bug
On couchdb's end, the requests were carried out successfully. I can see in the couchdb logs there are three concurrent GET requests, all responded with 200 OK
. Also, I can use the requests library to call the endpoints concurrently without issue. Those observations lead me to think it's some sort of race condition inside CouchDB-Python.
One question I have is, when you run this test case for 100 times (or 10), does it fail every time
Yes, it fails every single time.
I might be able to reduce the sample code even further to only use couchdb.http
methods to reproduce the issue. Out of curiosity, why CouchDB-Python didn't use the stock EDIT: I see you built httplib
? Sorry I'm not too familiar with the genesis of this project.ConnectionPool
on top of httplib
.