Failed to establish a new connection after a while with BigQuery
yu-iskw opened this issue · comments
If I apply dbt-osmosis yaml refactor
to a log of dbt models, I got the subsequent error after a while consecutively.
I am assuming the issue was caused by the timeout of authentication, though I haven't looked into the implementation.
How can sove the issue?
- dbt-core: 1.6.2
- dbt-osmosis: 0.12.4
- dbt BigQuery setup
- Using impersonate service account
- No
job_execution_timeout_seconds
ERROR Error occurred while processing model osmosis.py:931
model.xxx.xxxxxxxxxxxxxxxxxxxx: Deadline of 600.0s exceeded
while calling target function, last exception: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max
retries exceeded with url:
/bigquery/v2/projects/xxxxxx/datasets/if_xxxx/tables/xxxxxxxxxxxxxxxxxxxxxx?pret
tyPrint=false (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x14198e950>: Failed to
establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
I encountered a similar error for a very large dbt project (1000+ models). When I want to run dbt-osmosis yaml refactor
in these cases, I do the following.
- 1: limit down the number of target models by using positional args
- 2: Create a catalog file in advance and specify the
--catalog-file
option when runningdbt-osmosis yaml refactor
.- This will make trial and error easier if you have a large number of model files.
- If you have a large number of model files, it takes time to generate catalog files, so I have created a script that can generate targeted catalog files.
- For more details, see the following https://www.yasuhisay.info/entry/2023/08/13/215127
Thank you for the comment. I already tried the approach. So, it would be good to how to resolve it on the dbt-osmosis side rather than the workaround.
I'm looking into googleapis/google-auth-library-python#1356 , as the issue looks similar to this issue. And I'm doubting there is any conflicts between the multithreading in dbt-osmosis and the Google Cloud packages in python.
Upstream issue closed. I also added a small change here a1c2109 too that while I am not 100% sure would solve this, I think could be a conflating factor. We have an adapter connection invalidation/refresh process because DbtCoreInterface (our thin interface layer that keeps us 1 layer abstracted from dbt core) was designed to be used in a long running Service like a proxy server or custom LSP. But dbt-osmosis is just a typical process which will saturate the connection pool then spin down.