dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Home Page:https://getdbt.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes

cajubelt opened this issue · comments

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

dbt build -s tag:my_tag+ takes about 20 minutes longer to start on dbt 1.8.2 than it does on 1.5.9 with the same tag. The tag used has a lot of downstream nodes in our project, about 11k. Generally we’re seeing better performance on 1.8 so we were surprised to see this big regression in performance.

Expected/Previous Behavior

Previously building everything downstream of a tag with lots of nodes would take a couple minutes of startup time and then begin running queries against our db. Now it takes 20+ minutes.

Steps To Reproduce

  1. Set up dbt project with a tag that has about 11k downstream nodes
  2. Install dbt 1.8.2
  3. Run dbt build -s tag:my_tag+

Relevant log output

No response

Environment

- OS: MacOS 14.5 and Ubuntu 22.04
- Python: 3.9.12
- dbt (working version): 1.5.9
- dbt (regression version): 1.8.2

Which database adapter are you using with dbt?

bigquery

Additional Context

The reason we need this is because we have a selector used in CI/CD that excludes everything downstream of a tag that is upstream of many nodes. The example given is a simpler version of the original issue we found with that selector. (We tested the simpler version and found to also have the same performance issue.) The selector was something like

- name: my_selector
  definition: 
    union:
      - state:modified+
      - exclude:
        - tag:my_tag+

Thanks @cajubelt! Is this only for the build command (i.e. not run)?

I suspect this might be due to the large number of tests in your project (per our conversation), and the additional time that dbt spends "linking" the DAG (adding edges between test on upstream model -> downstream models, so that they skip on test failure). While it's not immediately clear to me what change we made between v1.5 -> v1.8 to that logic, if you only see this slowdown on build, it would be a strong indication that therein lies the problem.

@jtcohen6 yes I just confirmed that dbt run doesn't have any noticeable slowdown on 1.8 with the same selector.

@cajubelt Okay! So I think the hypothesis is:

  • A project with a lot of tests (~5k models and ~15k data tests) is meaningfully slower at startup for dbt build in v1.8 compared to v1.5
  • My suspicion is that this slowdown is happening within (or closely adjacent to) the add_test_edges method

If you're up for it, there are two ways to try confirming that hypothesis by profiling dbt's performance:

pip install py-spy
sudo py-spy record -s -f speedscope -- dbt build

We have the ability to exclude tests from the build resource types, but that doesn't affect whether or not we pass "add_test_edges" to compile. It would be possible to check the resource types and if tests are excluded, pass "add_test_edges" = False.

Of course if you do want tests to run, that doesn't help.

We do want to run tests if possible.

@jtcohen6 here's a screenshot of a search through py-spy's output for the longest segments of the flame graph. Looks like generic_bfs_edges is adding 12:41, not sure if that's particularly revealing however

Screenshot 2024-07-14 at 8 43 37 PM