dbt-labs / dbt-event-logging

a dbt package to make auditing dbt runs easy.

Home Page:https://hub.getdbt.com/dbt-labs/logging/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

INSERT Performance

kingfink opened this issue · comments

I don't have any quantitative evidence (yet) but anecdotally it seems INSERT operations for each model start/end may slow down our dbt runs. Also, from some research the AWS docs state:

These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node network as data is distributed by the leader to the compute nodes.

Has anyone else run into this and do you have any advice on alternatives or remedies?

hey @kingfink - yeah, definitely, inserting individual records is probably not a best-practice on analytical databases. I'm increasingly of the mind that using hooks for audit logging is a... suboptimal... decision, and I'd love to develop a better approach. I think whatever we come up with will probably look like a single batch load rather than a series of streamed inserts.

Will give this some thought and follow up here

We've now added a warning to the readme to let people know this (see #15)

Closing in favor of #16