Skip to main content
Version: 5.10.x

Workflows Metrics

info

Some of the following metrics gathering is still under development.

CR: Cost Information

Data showing the overall cost over time for Workflows infrastructure.

TTA: Time to first action

Preventing analysis cache discards, slow invalidated and eager-fetched repository rules, etc. which are on the critical loading-phase path. As Bazel does this work before computing Action Cache keys, no amount of remote cache or remote execution improves this indicator.

  • T0: Scheduler dispatch to warm runner
  • T1: git clone is up-to-date
  • T2: First action spawn according to Bazel profile

QT: Developer-perceived Queue time

Developers shouldn't be blocked by lack of CI resources. The amount of time that each pull request build remains in the queue

GR: Main branch greenness ratio

main should be green most of the time.

Ratio of the time in a given period where main was green to the total time of that period.

note

While Aspect can alert the BuildCop, the customer is responsible for a response such as reverting a commit

TTF: Developer-perceived Time to Failure

When the developer needs to fix their pull request, they are notified before they change context or leave their desk.

  • T0: Scheduler dispatch to runner
  • T1: Failure status reported back to developer
note

If the CI platform doesn't allow a "failing but not yet finished" status, Workflows reports the failure as a comment on the pull request.

LTA: Land-to-artifact

A commit that's needed in production quickly can go through the same process as less urgent ones. A developer might say "I need to ship to production more than once during an outage."

Time from a commit merged to main until all release artifacts are delivered for deployment. The customer controls the actions which must run, including long tests or big uploads, so Aspect can only control the parts outside the Critical Path reported by Bazel.

IR: Invalidations rate

Bazel's expensive computations: analysis cache, external/ folders, are not frequently occurring in a user's critical path. (This is included in Time to first Action, above).

Number of times we saw each kind of invalidation per number of builds.

note

PR builds can invalidate the caches on a runner, and if it takes another request that will invalidate back again. In the future we plan to lame-duck a runner which has invalidated caches to avoid it being used by anyone else.

We recommend enabling the rebase feature in Workflows so that PRs which cleanly rebase against the target branch will do so.

FPR: False positive breakage rate

We shouldn't bother a human unless the CI system requires manual repair. BuildCop reports a false positive through interaction with Aspect's system, typically via the Slack thread.