inception-health / otel-export-trace-action

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible race condition?

bixu opened this issue · comments

I've recently noticed that the Action runs but fails to upload any traces to Honeycomb (my Otel backend).

Debug log snippet:

Job 6637068577 is not completed yet
[559](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:560)
##[debug]Trace Job 6637068643
[560](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:561)
##[debug]Job Span<098384859e697312>: Started<2022-05-28T13:25:42Z>
[561](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:562)
##[debug]Trace 0 Steps
[562](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:563)
##[debug]Job Span<098384859e697312>: Ended<2022-05-28T13:25:42Z>
[563](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:564)
##[debug]Trace Job 6637068684
[564](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:565)
##[debug]Job Span<3eaff15280df1c40>: Started<2022-05-28T13:25:42Z>
[565](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:566)
##[debug]Trace 0 Steps
[566](https://github.com/<org>/<repo>/runs/6637068577?check_suite_focus=true#step:2:567)
##[debug]Job Span<3eaff15280df1c40>: Ended<2022-05-28T13:25:42Z>

This despite the fact that the previous job in the workflow has completed, and is depended on by the the tracing job with a needs statement.

I'm not confident that using this action in the same workflow is stable and recommended. I use it as a separate workflow that triggers on workflow complete. One thing we can try is to update this action to run in the post lifecycle stage.

Running in post does (to my inexpert ear) sounds like the right thing to do. To be clear, I'd seen this behavior even when the exporter Action was in a separate workflow.

Hmm, if you've seen it in an external workflow, then the post run is not going to solve the issue. Did you specify to trigger for completed workflows?

If there is a race condition with GitHubs API availability and it's completed event then I think the only option is to hard fail when a job is not completed and set up some retry mechanism outside of the action

Hmm, if you've seen it in an external workflow, then the post run is not going to solve the issue. Did you specify to trigger for completed workflows?

Yes, we are waiting for completed state.

Interestingly, even if I try to re-run the tracing job that was unable to find the original job ID much later, the tracing job is still unable to find the original job. Which makes me think we might be incorrectly identifying the original workflow or job.

Here's an example of the config we are using, with secrets redacted:

name: Trace Build

on:
  workflow_run:
    workflows:
      - "Code Style"
      - "Infrastructure Tests"
      - "Linting"
      - "Load Test"
      - "Release"
      - "Smoke Test"
      - "Test Suite"
    types: [ completed ]

jobs:
  otel-export-trace:
    name: Export Trace
    runs-on: ubuntu-latest
    steps:
      - name: Export Workflow Trace
        continue-on-error: true
        if: always()
        uses: inception-health/otel-export-trace-action@0bd92b941bbd25d6c8cdf8800df5ac19f9844271
        with:
          otlpEndpoint: grpc://api.honeycomb.io:443/
          otlpHeaders: x-honeycomb-team=${{ secrets.API_KEY }},x-honeycomb-dataset=<dataset>
          githubToken: ${{ secrets.GITHUB_TOKEN }}

Well now I'm confused :)

It looks like we are getting traces in Honeycomb despite the (misleading?) GHA log messages. Closing, and sorry for the noise.

Ok, i see the problem. when you are running as a separate workflow using the workflow_run event you need to inject the run_id for the workflow_run event, otherwise it will use the current workflow's RUN_ID, which of course isn't finished and is still in progress.

name: Trace Build

on:
  workflow_run:
    workflows:
      - "Code Style"
      - "Infrastructure Tests"
      - "Linting"
      - "Load Test"
      - "Release"
      - "Smoke Test"
      - "Test Suite"
    types: [ completed ]

jobs:
  otel-export-trace:
    name: Export Trace
    runs-on: ubuntu-latest
    steps:
      - name: Export Workflow Trace
        continue-on-error: true
        if: always()
        uses: inception-health/otel-export-trace-action@0bd92b941bbd25d6c8cdf8800df5ac19f9844271
        with:
          otlpEndpoint: grpc://api.honeycomb.io:443/
          otlpHeaders: x-honeycomb-team=${{ secrets.API_KEY }},x-honeycomb-dataset=<dataset>
          githubToken: ${{ secrets.GITHUB_TOKEN }}
          runId: ${{ github.event.workflow_run.id }}