argoproj / argo-helm

ArgoProj Helm Charts

Home Page:https://argoproj.github.io/argo-helm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optional REDIS_PASSWORD env variable cause race condition and Redis NOAUTH failures

michaelvl opened this issue · comments

Describe the bug

In PR #2800 the REDIS_PASSWORD environment variable was made optional, i.e. several Deployment resource have something along these lines:

...
        env:
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: argocd-redis
                optional: true
                key: auth

The problem is if the secret does not exist when Pods start, then these Pods will be unable to authenticate to Redis. The Redis Pod itself does not have this env variable marked as optional, i,e, Redis will not startup until the secret is available.

The chart can generate the secret through a Job or the secret can be created by external means. Both of these can cause the secret not to be available when e.g. repo-server or application-controller Pods start.

Related helm chart

argo-cd

Helm chart version

Any after PR #2800, e.g. 7.3.7

To Reproduce

Install ArgoCD with redisSecretInit.enabled = false, let Pods startup - e.g. repo-server and application-controller will start successfully whereas redis will restart repeatedly with 'config error' due to missing secret. Finally, create secret to let redis start. After this, one can observe 'NOAUTH Authentication required' logs in repo-server and application-controller Pods.

Expected behavior

Pods that must authenticate towards Redis and thus relies on the Redis secret do not start until the secret is available - like the redis Pod itself does now.

Screenshots

No response

Additional context

No response

@shlomitubul
Could you please provide some details why you made this PR (it misses a motivation or a linked issue with context)?

Also it would make sense to involve @mbevc1 and @yu-croco as you approved the implementation.
When I reviewed this PR after coming back from my vacation (after PR was merged), I thought that exactly this happens in the use case provided by issue reporter.

as you approved the implementation.

Oh,,, I just thought the PR was trying to fix a small mistake (I was the mistake after all🙃 ) since I didn't realize the case in this issue . 🫠

While waiting for @shlomitubul 's reply, I prepared draft PR for this issue. 🙏

I just came across this after upgrading to v2.11.6 and trying to figure out if I have the same issue you do.

Were you able to verify that this was indeed the issue? I checked the environment of each of the pods, and they had REDIS_PASSWORD. I even recycled the pods and I still have the issue. Does recycling the pods work for you?

My gut tells me this is related to argoproj/argo-cd#15912 - something makes configuration inconsistent between components.

Trying to get this set up in my local cluster so I can iterate more.

EDIT: narrowed it to v2.11.1: argoproj/argo-cd@v2.11.0...v2.11.1
argoproj/argo-cd@f1a449e

Can this be re-opened? #2839 did not fix this, as suspected.

argocd: v2.12.0+ec30a48
  BuildDate: 2024-08-05T13:46:43Z
  GitCommit: ec30a48bce7a60046836e481cd2160e28c59231d
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.12.0+ec30a48
  BuildDate: 2024-08-05T13:46:43Z
  GitCommit: ec30a48bce7a60046836e481cd2160e28c59231d
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64

@purajit What is your issue? Your Argo CD components cannot connect to Redis?

@mkilchhofer Yup, any path like argo app diff that relies on the cache fails with the NOAUTH error above; I've tried recycling the pods, I've checked their environments, and I don't see anything obviously wrong.

In particular, while basic Argo sync still works, argocd app diff is one functionality that depends on the cache and fails.

Ah we are only talking about the CLI functionality? You didn't saw any suspicious log line(s) inside the running Argo CD pods?

Sorry for the misunderstanding, no, since this is an issue with the overall setup, there are
logs even internally in Argo as well. I was just giving an example of a functionality that
obviously surfaces the issue and completely stops working - argocd app diff is completely
unfunctional right now without trying to turn off Redis auth.

Here are examples of internal logs:

1:S 14 Aug 2024 17:49:49.910 # MASTER aborted replication with an error: NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.910 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.906 # MASTER aborted replication with an error: NOAUTH Authentication required.
1:S 14 Aug 2024 17:49:49.906 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.

Hi @purajit

You shared logs of the redis-ha container or sentinel container, right?

I think the issue is related but a bit different than the one described in the initial post of this issue.

Nevertheless, could you share the values.yaml you use for your Argo CD setup, so we can try to reproduce your issue?

Haven't included applicationSet, notifications, repoServer, etc; the rest is here
(explicitly redacted values instead of removing them so you know all the contents):

server:
  extraArgs:
  - --insecure
  - --enable-gzip
  replicas: <...>
  resources: <...>
  serviceAccount: <...>
redis-ha:
  enabled: true
  exporter:
    enabled: false
  extraLabels:
    app.kubernetes.io/name: argocd-redis-ha-haproxy
  haproxy:
    image:
      repository: public.ecr.aws/docker/library/haproxy
  image:
    repository: public.ecr.aws/docker/library/redis
configs:
  clusterCredentials: <...>
  cm: 
    dex.config: <...>
    statusbadge.enabled: <...>
    url: <...>
  credentialTemplates: <...>
  params:
    redis.compression: none
  rbac: <...>
  repositories: <...>
  secret:
    githubSecret: <...>