buildbuddy-io / buildbuddy

BuildBuddy is an open source Bazel build event viewer, result store, remote cache, and remote build execution platform.

Home Page:https://buildbuddy.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support recording TargetStatus info for more/all invocations

minor-fixes opened this issue · comments

I'm trying to gather data to see what the heaviest targets are in our repository (by execution count and time) to assist in figuring out to what extent a (rather complex) set of targets is well-cached. The SQL database backing BuildBuddy seems to have all the pieces in place for recording what I need (runtime per target per invocation, by joining invocations <-> TargetStatuses <-> Targets), but the resulting aggregation only contains results for builds tagged role=CI. In my case, this is set on automated postsubmit builds only (always from master, reflects merged code health). As a result, presubmit builds and one-off interactive builds are not reflected in this data.

The code appears to make this choice explicitly. I could set role=CI on all builds, but that would reduce the usefulness of the test grid page.

Some questions:

  • What problems should I expect if I remove this check? (I already expect ~10x DB data usage, and needing to beef up the instance so that queries can go through the data). Do semantics elsewhere break? (maybe some queries assume they don't need a WHERE role = "CI" clause when they now would?)
  • The BuildBuddy UI still seems to know target times for non-CI builds in the UI - how does it know this if it's not stored in the DB? Is it re-fetching and parsing a BES stream object?
  • Would you all be amenable to making this a configurable setting? If so, what kind of setting would you prefer? (I may be able to make that contribution)

Hi @minor-fixes, thanks for reaching out! It seems like you have a pretty good grasp of what's going on here and why. To answer your questions specifically:

  • Yeah, you should expect a lot more data to be dumped into those tables (potentially slowing down invocation upload, but depending on volume). I don't think this breaks semantics anywhere except the test grid
  • Target times are pulled from the BES stream which we parse and re-serve to the frontend to extract some data like you mention on a per-invocation level
  • I'm not sure this makes sense as a general config setting at the moment, but primarily because we don't think storing this type of data for all builds in mysql is a great idea generally, since there's so much of it and you often want to do long aggregation queries over it

To that end, we're working on storing more target level data in clickhouse to do the kind of analysis it sounds like you want. Surfacing a target's performance, cache hit rate, etc over time is a big part of that. I don't have a firm launch date for this for you but we're actively working on it / testing it (you can probably see it in the code).

Feel free to reach out on our slack if you want to talk more about any of this, we're there and happy to help!