[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes

Question

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes

cajubelt opened this issue 18 days ago · comments

Charlie Andrews-Jubelt commented 18 days ago

Is this a regression in a recent version of dbt-core?

I believe this is a regression in dbt-core functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

dbt build -s tag:my_tag+ takes about 20 minutes longer to start on dbt 1.8.2 than it does on 1.5.9 with the same tag. The tag used has a lot of downstream nodes in our project, about 11k. Generally we’re seeing better performance on 1.8 so we were surprised to see this big regression in performance.

Expected/Previous Behavior

Previously building everything downstream of a tag with lots of nodes would take a couple minutes of startup time and then begin running queries against our db. Now it takes 20+ minutes.

Steps To Reproduce

Set up dbt project with a tag that has about 11k downstream nodes
Install dbt 1.8.2
Run dbt build -s tag:my_tag+

Relevant log output

No response

Environment

- OS: MacOS 14.5 and Ubuntu 22.04
- Python: 3.9.12
- dbt (working version): 1.5.9
- dbt (regression version): 1.8.2

Which database adapter are you using with dbt?

bigquery

Additional Context

The reason we need this is because we have a selector used in CI/CD that excludes everything downstream of a tag that is upstream of many nodes. The example given is a simpler version of the original issue we found with that selector. (We tested the simpler version and found to also have the same performance issue.) The selector was something like

- name: my_selector
  definition: 
    union:
      - state:modified+
      - exclude:
        - tag:my_tag+

Jeremy Cohen · Answer 1 · Fri Jul 12 2024 20:44:26 GMT+0800 (China Standard Time)

Thanks @cajubelt! Is this only for the build command (i.e. not run)?

I suspect this might be due to the large number of tests in your project (per our conversation), and the additional time that dbt spends "linking" the DAG (adding edges between test on upstream model -> downstream models, so that they skip on test failure). While it's not immediately clear to me what change we made between v1.5 -> v1.8 to that logic, if you only see this slowdown on build, it would be a strong indication that therein lies the problem.

Charlie Andrews-Jubelt · Answer 2 · Fri Jul 12 2024 22:01:06 GMT+0800 (China Standard Time)

@jtcohen6 yes I just confirmed that dbt run doesn't have any noticeable slowdown on 1.8 with the same selector.

Jeremy Cohen · Answer 3 · Fri Jul 12 2024 22:52:21 GMT+0800 (China Standard Time)

@cajubelt Okay! So I think the hypothesis is:

A project with a lot of tests (~5k models and ~15k data tests) is meaningfully slower at startup for dbt build in v1.8 compared to v1.5
My suspicion is that this slowdown is happening within (or closely adjacent to) the add_test_edges method

Jeremy Cohen · Answer 4 · Fri Jul 12 2024 23:05:20 GMT+0800 (China Standard Time)

If you're up for it, there are two ways to try confirming that hypothesis by profiling dbt's performance:

Using snakeviz as documented here: https://docs.getdbt.com/reference/global-configs/record-timing-info (this way is a bit older, but more familiar to me)
Using py-spy (snazzier) in the way @peterallenwebb recommends here:

pip install py-spy
sudo py-spy record -s -f speedscope -- dbt build

Gerda Shank · Answer 5 · Sat Jul 13 2024 01:41:30 GMT+0800 (China Standard Time)

We have the ability to exclude tests from the build resource types, but that doesn't affect whether or not we pass "add_test_edges" to compile. It would be possible to check the resource types and if tests are excluded, pass "add_test_edges" = False.

Of course if you do want tests to run, that doesn't help.

Charlie Andrews-Jubelt · Answer 6 · Mon Jul 15 2024 08:46:47 GMT+0800 (China Standard Time)

We do want to run tests if possible.

@jtcohen6 here's a screenshot of a search through py-spy's output for the longest segments of the flame graph. Looks like generic_bfs_edges is adding 12:41, not sure if that's particularly revealing however