mozilla / bigquery-etl

Bigquery ETL

Home Page:https://mozilla.github.io/bigquery-etl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce noise in PR diffs

scholtzan opened this issue · comments

The diff that gets automatically posted to PRs has been getting more noisy now that we de-activated that PRs need to be up-to-date with main. The diff tends to show changes that weren't made in the PR, but got pushed to main after the PR was last synced with main. To reduce the noise, we could do a pseudo merge for the diff and combine main branch with the PR branch to get the most recent diff. However, this would not work for merge conflicts.

┆Issue is synchronized with this Jira Task

Another cause for noise is non-deterministic sql generation and schema updates which sometimes causes the order of things in a query or schema to change compared to the generated-sql branch. This causes the stage deploy to deploy and dry run queries that are unrelated to the changes.

The diff here is one example: #5790 (comment)

The common ones I've seen are contextual_services/event_aggregates/schema.yaml and the generated event_monitoring_live_v1 materialized views. event_monitoring_live_v1 should be improved with #5799 but it still creates some superfluous diffs because it uses the current date in the template.

#5592 may have also reduced some noise by ordering the sql generators