tsindex.Client crashes with a QueuePool limit overflow
Geenimetsuri opened this issue · comments
Avoid duplicates
- I searched existing issues
Bug Summary
For some reason obspy.clients.filesystem.tsindex
client crashes with a SQLAlchemy QueuePool limit overflow when searching for waveforms: ValueError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
.
I have no idea what the underlying issue is. My suspect is that either wildcard queries spawn too many queues(?) and/or that they are not released in case the search fails...? FWIW, I did try several different versions of SQLAlchemy (currently: 1.4.52, tested also 2.0.29) and mseedindex (currently: 3.0.5, tested 2.7.1). All are installed with pip install ==version.
Something that might be related is that before I added the data_client.has_data()
clause the code bit spawned a ton of SAWarning: SELECT statement has a cartesian product between FROM element(s) "flattened_request_cte", "tsindex" and FROM element "raw_request_cte". Apply join condition(s) between each element to resolve.
warnings too. Afterwards, it still throws this every now and then but adding the has_data check cut them down to something like few percent.
As background for the code, the way I create the index is:
indexer = Indexer( mseed_archive_location, mseed_archive_SQL_index, leap_seconds_file = 'DOWNLOAD', parallel = 1 ); indexer.run()
Finally, the "solution"...
The way I circumvented the issue was adding pool_size and max_overflow to line 1379 of ./obspy/clients/filesystem/tsindex.py
:
self.engine = sa.create_engine(db_path, poolclass=QueuePool, pool_size = 100, max_overflow = 1000 )
However, I have no idea whether this only postpones the issue or sorts it out completely.
Code to Reproduce
# Basically this codeblock. Client can be tsindex or FDSN client, but only the tsindex crashes.
try:
hasdata = data_client.has_data( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
except AttributeError:
hasdata = True # We have a FDSN client instead
if ( hasdata ):
try:
st = data_client.get_waveforms( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
except ( FDSNForbiddenException, FDSNNoDataException ) as E:
st = False
Error Traceback
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1569, in _fetch_index_rows
summary_present = self.has_tsindex_summary()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1491, in has_tsindex_summary
table_names = sa.inspect(self.engine).get_table_names()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/inspection.py", line 64, in inspect
ret = reg(subject)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 182, in _engine_insp
return Inspector._construct(Inspector._init_engine, bind)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 117, in _construct
init(self, bind)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 128, in _init_engine
engine.connect().close()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3325, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
else engine.raw_connection()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3404, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
return fn()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 327, in connect
return _ConnectionFairy._checkout(self)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
rec = pool._do_get()
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/projektit/CORRSCAN/EQ_corrscan/make_template.py", line 292, in <module>
fully_picked_data = dt.get_event_data( fully_picked_events, data_lead = ORIGIN_LEAD, data_tail = ORIGIN_TAIL, allow_imprecise_time = ALLOW_IMPRECISE_TIME, stations = include_stations, use_origin = True, cache_missing = True, raw_data_dir = raw_data_dir, data_clients = [ data_client, fdsn_client ], mute_errors = True )
File "/home/user/projektit/GIT/ISUH-seismic-tools/WIP/EQ_corrscan/libs/data_tool.py", line 688, in get_event_data
hasdata = data_client.has_data( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 558, in has_data
avail_percentage = self.get_availability_percentage(network,
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 494, in get_availability_percentage
avail = self.get_availability(network, station,
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 415, in get_availability
tsindex_rows = self._get_tsindex_rows(network, station,
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 679, in _get_tsindex_rows
return self.request_handler._fetch_index_rows(query_rows)
File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1668, in _fetch_index_rows
raise ValueError(str(err))
ValueError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)
# The code block also occasionally throws this warning:
/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py:1672: SAWarning: SELECT statement has a cartesian product between FROM element(s) "flattened_request_cte", "tsindex" and FROM element "raw_request_cte". Apply join condition(s) between each element to resolve.
# edit: Oh, yeah, one more thing. The indexing alone also throws this very similiar singular error after finding no new files:
mseedindex version: 3.0.5
/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py:1672: SAWarning: SELECT statement has a cartesian product between FROM element(s) "tsindex", "flattened_request_cte" and FROM element "raw_request_cte". Apply join condition(s) between each element to resolve.
ObsPy Version?
1.4.0
Operating System?
Mint 21.0
Python Version?
3.10.6
Installation Method?
pip