open-telemetry / opentelemetry-collector

OpenTelemetry Collector

Home Page:https://opentelemetry.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log a warning when an environment variable without a value is expanded

dyladan opened this issue · comments

Related to #5614

Is your feature request related to a problem? Please describe.

In open-telemetry/opentelemetry-collector-contrib#11846 the user was trying to use "$ConnectionString" as a Username. The collector automatically interprets $ as an environment variable expansion. The environment variable was empty, so the Username field was interpreted as empty and failed validation. There was nothing logged in the collector to alert this user that their configuration was being mangled by the failed environment expansion.

Describe the solution you'd like

When an environment variable expansion has no value, log a warning.

Another possible "fix" for this is to log the full collector config at startup. IIRC this is what telegraf does. This would allow the user to see that their config is messed up.

One major issue with this solution is that you would need to reliably hide secrets while also making it possible to see if an expansion failed. We currently don't have any way to distinguish between expansions that are secret and should be obfuscated in logs and expansions that are not.

I think we should fail fast instead of logging a warning. This is a breaking change, but I think it's worth it in terms of improved troubleshooting and we can have a feature gate to make the transition smoother. We probably also want to prioritize implementing #5228, which will be useful in this case.

For printing the configuration, we have #5223. If you have ideas on how to reliably hide secrets please participate in the discussion there!

I also think a fail-fast strategy is a good one. I didn't suggest it initially because I didn't know how conservative the collector SIG tends to be about those types of breaks. One possible midway solution is to have a strict mode which would toggle behavior between warn and failure

edit: a strict mode may also allow other similar enhancements such as failing to start up when an unknown config key is present which may catch typos in some cases

failing to start up when an unknown config key is present which may catch typos in some cases

We already do this by default, so it makes all the more sense to be consistent with environment variables :) If I find some time this week I will try to address this

I started making a PoC for raising a warning (and failing if enabled through a feature gate) on #5734. Since the logger is not available when loading the configuration, we have to signal the need to log a warning in some way. On the PR, I did this with a special error type ('NonFatalError') that needs to be handled by all configuration-related functions.

There are other alternatives, like changing providers/converters signature to return some sort of 'diagnostics' struct that includes any warnings, or giving up on logging a warning altogether.

@open-telemetry/collector-approvers What do you think is the best approach?

I am adding this to the confmap milestone, we need to discuss if we want to change the behavior or if we want to make any changes on the Go API to allow for non fatal errors before reaching 1.0.

Not a fan if the idea of failing if environment expansion encounters an unset value. That may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value. Logging that a replacement failed makes sense and would be a good addition.

[failing if environment expansion encounters an unset value] may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value.

Would it be possible to support that through support for setting a default value if unset? (Not sure if we should do that, I am just trying to understand your point here)

[failing if environment expansion encounters an unset value] may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value.

Would it be possible to support that through support for setting a default value if unset? (Not sure if we should do that, I am just trying to understand your point here)

Maybe, but in that case I'd argue that the default default value should be an empty string which takes us right back to where we are today.

This was discussed in SIG meeting 3/13 as part of the open-telemetry/opentelemetry-collector-contrib#9984 issue review.

@mx-psi is this closed by #9837

@TylerHelmuth No, because the logger is noop. We need to pass a functioning logger, which is still a WIP