z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools

Home Page:https://z3z1ma.github.io/dbt-osmosis/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to establish a new connection after a while with BigQuery

yu-iskw opened this issue · comments

If I apply dbt-osmosis yaml refactor to a log of dbt models, I got the subsequent error after a while consecutively.
I am assuming the issue was caused by the timeout of authentication, though I haven't looked into the implementation.
How can sove the issue?

  • dbt-core: 1.6.2
  • dbt-osmosis: 0.12.4
  • dbt BigQuery setup
    • Using impersonate service account
    • No job_execution_timeout_seconds
ERROR    Error occurred while processing model                                                                                    osmosis.py:931
         model.xxx.xxxxxxxxxxxxxxxxxxxx: Deadline of 600.0s exceeded
         while calling target function, last exception: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Max
         retries exceeded with url:
         /bigquery/v2/projects/xxxxxx/datasets/if_xxxx/tables/xxxxxxxxxxxxxxxxxxxxxx?pret
         tyPrint=false (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x14198e950>: Failed to
         establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

I encountered a similar error for a very large dbt project (1000+ models). When I want to run dbt-osmosis yaml refactor in these cases, I do the following.

  • 1: limit down the number of target models by using positional args
  • 2: Create a catalog file in advance and specify the --catalog-file option when running dbt-osmosis yaml refactor.
    • This will make trial and error easier if you have a large number of model files.
    • If you have a large number of model files, it takes time to generate catalog files, so I have created a script that can generate targeted catalog files.

Thank you for the comment. I already tried the approach. So, it would be good to how to resolve it on the dbt-osmosis side rather than the workaround.

I'm looking into googleapis/google-auth-library-python#1356 , as the issue looks similar to this issue. And I'm doubting there is any conflicts between the multithreading in dbt-osmosis and the Google Cloud packages in python.

Upstream issue closed. I also added a small change here a1c2109 too that while I am not 100% sure would solve this, I think could be a conflating factor. We have an adapter connection invalidation/refresh process because DbtCoreInterface (our thin interface layer that keeps us 1 layer abstracted from dbt core) was designed to be used in a long running Service like a proxy server or custom LSP. But dbt-osmosis is just a typical process which will saturate the connection pool then spin down.