obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.

Home Page:https://www.obspy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tsindex.Client crashes with a QueuePool limit overflow

Geenimetsuri opened this issue · comments

Avoid duplicates

  • I searched existing issues

Bug Summary

For some reason obspy.clients.filesystem.tsindex client crashes with a SQLAlchemy QueuePool limit overflow when searching for waveforms: ValueError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r).

I have no idea what the underlying issue is. My suspect is that either wildcard queries spawn too many queues(?) and/or that they are not released in case the search fails...? FWIW, I did try several different versions of SQLAlchemy (currently: 1.4.52, tested also 2.0.29) and mseedindex (currently: 3.0.5, tested 2.7.1). All are installed with pip install ==version.

Something that might be related is that before I added the data_client.has_data() clause the code bit spawned a ton of SAWarning: SELECT statement has a cartesian product between FROM element(s) "flattened_request_cte", "tsindex" and FROM element "raw_request_cte". Apply join condition(s) between each element to resolve. warnings too. Afterwards, it still throws this every now and then but adding the has_data check cut them down to something like few percent.

As background for the code, the way I create the index is:
indexer = Indexer( mseed_archive_location, mseed_archive_SQL_index, leap_seconds_file = 'DOWNLOAD', parallel = 1 ); indexer.run()

Finally, the "solution"...

The way I circumvented the issue was adding pool_size and max_overflow to line 1379 of ./obspy/clients/filesystem/tsindex.py:
self.engine = sa.create_engine(db_path, poolclass=QueuePool, pool_size = 100, max_overflow = 1000 )

However, I have no idea whether this only postpones the issue or sorts it out completely.

Code to Reproduce

# Basically this codeblock. Client can be tsindex or FDSN client, but only the tsindex crashes.

try:
  hasdata = data_client.has_data( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
except AttributeError:
  hasdata  =  True # We have a FDSN client instead
if ( hasdata ):
  try:
    st = data_client.get_waveforms( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
  except ( FDSNForbiddenException, FDSNNoDataException ) as E:
    st = False

Error Traceback

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1569, in _fetch_index_rows
    summary_present = self.has_tsindex_summary()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1491, in has_tsindex_summary
    table_names = sa.inspect(self.engine).get_table_names()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/inspection.py", line 64, in inspect
    ret = reg(subject)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 182, in _engine_insp
    return Inspector._construct(Inspector._init_engine, bind)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 117, in _construct
    init(self, bind)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 128, in _init_engine
    engine.connect().close()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3325, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3404, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 134, in _do_get
    raise exc.TimeoutError(
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/projektit/CORRSCAN/EQ_corrscan/make_template.py", line 292, in <module>
    fully_picked_data           =   dt.get_event_data( fully_picked_events, data_lead = ORIGIN_LEAD, data_tail = ORIGIN_TAIL, allow_imprecise_time = ALLOW_IMPRECISE_TIME, stations = include_stations, use_origin = True, cache_missing = True, raw_data_dir = raw_data_dir, data_clients = [ data_client, fdsn_client ], mute_errors = True )
  File "/home/user/projektit/GIT/ISUH-seismic-tools/WIP/EQ_corrscan/libs/data_tool.py", line 688, in get_event_data
    hasdata =   data_client.has_data( network = "*", station = station, location = "*", channel = "*", starttime = trace_start, endtime = trace_start + trace_length )
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 558, in has_data
    avail_percentage = self.get_availability_percentage(network,
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 494, in get_availability_percentage
    avail = self.get_availability(network, station,
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 415, in get_availability
    tsindex_rows = self._get_tsindex_rows(network, station,
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 679, in _get_tsindex_rows
    return self.request_handler._fetch_index_rows(query_rows)
  File "/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py", line 1668, in _fetch_index_rows
    raise ValueError(str(err))
ValueError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/14/3o7r)

# The code block also occasionally throws this warning:
/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py:1672: SAWarning: SELECT statement has a cartesian product between FROM element(s) "flattened_request_cte", "tsindex" and FROM element "raw_request_cte".  Apply join condition(s) between each element to resolve.

# edit: Oh, yeah, one more thing. The indexing alone also throws this very similiar singular error after finding no new files:
mseedindex version: 3.0.5
/home/user/.pyenv/versions/3.10.6/lib/python3.10/site-packages/obspy/clients/filesystem/tsindex.py:1672: SAWarning: SELECT statement has a cartesian product between FROM element(s) "tsindex", "flattened_request_cte" and FROM element "raw_request_cte".  Apply join condition(s) between each element to resolve.

ObsPy Version?

1.4.0

Operating System?

Mint 21.0

Python Version?

3.10.6

Installation Method?

pip