Failed to establish a new connection after a while with BigQuery

Question

Failed to establish a new connection after a while with BigQuery

yu-iskw opened this issue 10 months ago · comments

If I apply dbt-osmosis yaml refactor to a log of dbt models, I got the subsequent error after a while consecutively.
I am assuming the issue was caused by the timeout of authentication, though I haven't looked into the implementation.
How can sove the issue?

dbt-core: 1.6.2
dbt-osmosis: 0.12.4
dbt BigQuery setup
- Using impersonate service account
- No job_execution_timeout_seconds

ERROR    Error occurred while processing model                                                                                    osmosis.py:931
         model.xxx.xxxxxxxxxxxxxxxxxxxx: Deadline of 600.0s exceeded
         while calling target function, last exception: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max
         retries exceeded with url:
         /bigquery/v2/projects/xxxxxx/datasets/if_xxxx/tables/xxxxxxxxxxxxxxxxxxxxxx?pret
         tyPrint=false (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x14198e950>: Failed to
         establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

Yasuhisa Yoshida · Answer 1 · Wed Sep 27 2023 10:33:26 GMT+0800 (China Standard Time)

I encountered a similar error for a very large dbt project (1000+ models). When I want to run dbt-osmosis yaml refactor in these cases, I do the following.

1: limit down the number of target models by using positional args
2: Create a catalog file in advance and specify the --catalog-file option when running dbt-osmosis yaml refactor.
- This will make trial and error easier if you have a large number of model files.
- If you have a large number of model files, it takes time to generate catalog files, so I have created a script that can generate targeted catalog files.
  - For more details, see the following https://www.yasuhisay.info/entry/2023/08/13/215127

Yu Ishikawa · Answer 2 · Fri Sep 29 2023 08:17:18 GMT+0800 (China Standard Time)

Thank you for the comment. I already tried the approach. So, it would be good to how to resolve it on the dbt-osmosis side rather than the workaround.

Yu Ishikawa · Answer 3 · Fri Sep 29 2023 10:40:41 GMT+0800 (China Standard Time)

I'm looking into googleapis/google-auth-library-python#1356 , as the issue looks similar to this issue. And I'm doubting there is any conflicts between the multithreading in dbt-osmosis and the Google Cloud packages in python.

Alexander Butler · Answer 4 · Mon May 06 2024 02:39:54 GMT+0800 (China Standard Time)

Upstream issue closed. I also added a small change here a1c2109 too that while I am not 100% sure would solve this, I think could be a conflating factor. We have an adapter connection invalidation/refresh process because DbtCoreInterface (our thin interface layer that keeps us 1 layer abstracted from dbt core) was designed to be used in a long running Service like a proxy server or custom LSP. But dbt-osmosis is just a typical process which will saturate the connection pool then spin down.