nats-io / nack

NATS Controllers for Kubernetes (NACK)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NACK fails to create a stream with kustomize and namespacing

manifest opened this issue · comments

I've installed CRDs, NACK using Helm chart, and created a stream in the same namespace as NACK and NATS server.

In logs, I see that NACK client opened a connection to the NATS server.
Then nothing changes. I don’s see any stream in NATS using its command-tool.

# NACK log
I1222 17:45:47.333603       1 main.go:117] Starting /jetstream-controller v...
​
# NATS log
[6] 2021/12/22 17:45:46.411918 [DBG] 10.233.122.229:39318 - cid:24 - "v1.12.1:go:jetstream-controller" - Client connection closed: Client Closed
[6] 2021/12/22 17:45:52.336792 [DBG] 10.233.68.1:56700 - cid:25 - Client connection created
[6] 2021/12/22 17:45:54.605762 [DBG] 10.233.68.1:56700 - cid:25 - "v1.12.1:go:jetstream-controller" - Client Ping Timer
# stream state is empty
kubectl get streams -n testing01
NAME                            STATE   STREAM NAME   SUBJECTS
nats-back-back-reliable-strem           reliable      ["scope.*.*"]
​
# port forwarding
nats str list -s localhost:59286
No Streams defined
apiVersion: jetstream.nats.io/v1beta2
kind: Stream
metadata:
  name: nats-back-back-reliable-strem
spec:
  name: reliable
  subjects:
  - "scope.*.*"
  storage: memory
  retention: limits
  discard: old
  maxAge: 1h
  maxBytes: 1048576
  replicas: 1

Definition of the stream.

jetstream:
  enabled: true
  nats:
    url: nats://nats-back-back-internal-lb:4222

namespaced: true

NACK config.

nats:
  image: nats:2.6.5-alpine3.14
  resources:
    limits:
      cpu: 0.1
      memory: 300Mi
    requests:
      cpu: 0.1
      memory: 300Mi
  logging:
    debug: true

  jetstream:
      enabled: true

      memStorage:
        enabled: true
        size: 1Gi

exporter:
  enabled: true
  image: natsio/prometheus-nats-exporter:0.9.0
  serviceMonitor:
    enabled: true

cluster:
  enabled: false

natsbox:
  enabled: false

auth:
  enabled: false

NATS server config.

@variadico here is a full example.
You should be able to reproduce the case just running:

kustomize build --enable-helm . | kubectl apply -f -

Ah, ok! Thanks! I'll check this out.

I was able to successfully create a stream based on the provided YAMLs above. So, I need to dig a little more. I'm using minikube with --kubernetes-version=v1.22.2, not sure if that affects things.

Here are the commands I ran.

# I needed to install this so serviceMonitor.enable=true would work.
helm install myprom prometheus-community/kube-prometheus-stack

# nats.yaml is the YAML in the description above, unchanged.
helm install -f nats.yaml nats nats/nats

# Install CRDs
kubectl apply -f https://raw.githubusercontent.com/nats-io/nack/v0.6.0/deploy/crds.yml

# nack.yaml is the YAML in the description above, I changed jetstream.url=nats://nats:4222
helm install -f nack.yaml jsc nats/nack

# stream.yaml is the YAML in the description above, unchanged.
kubectl apply -f stream.yaml

I can confirm that the stream has been created on Kubernetes v1.22.2.

Is there anything I can do to make it working with v1.19.2? The issue arises on this version.

Ooooh, ok. We generally test on actively supported Kubernetes versions. 1.19 reached its end of life back in October 2021.
https://kubernetes.io/releases/patch-releases/#non-active-branch-history

I can look into 1.19 and see if we can make it work.

That would be great, thanks!
We still use v1.19.2 in production.

Hmm... I'm running the example in your repo on Kubernetes 1.19.2.

When you tail the logs for jetstream-controller, do you see any log files with Failed to watch *v1beta2.Stream?

I was able to run into a similar situation where the streams don't get created, but it looks like the reason they don't get created is because the jetstream-controller can't the find CRDs. (Maybe because the CRDs aren't available in namespace testing01?)

I tried removing all of the namespace testing01 YAML and creating everything in the default namespace, that seems work. However, you probably want to namespace the install. I wonder if this is a config issue. 🤔

There is just a single line in logs of the jetstream-controller

k logs po/nack-5f5fbd7c66-227q4 -n testing01
I0113 10:41:02.139603       1 main.go:117] Starting /jetstream-controller v...

I wonder if this is a config issue.

I can try something that will help investigate the issue.

If you're able to use the latest commit in the nats-io/k8s repo, maybe you could get some more logs if you set klogLevel in your values.yaml file. This latest commit hasn't made it to a release yet.

jetstream:
  enabled: true
  klogLevel: 10  # <-- increases log verbosity to 10
  nats:
    url: nats://nats-back-back-internal-lb:4222

namespaced: true

With the latest commit it works :-) I was able to create a stream. Thank you.
When can we expect a release with these changes?

k get stream -n testing01

NAMESPACE   NAME                             STATE     STREAM NAME   SUBJECTS
testing01   nats-back-back-reliable-stream   Created   reliable      ["scope.*.*"]
nats str list -s localhost:54600
╭───────────────────────────────────────────────────────────────────────────────╮
│                                    Streams                                    │
├──────────┬─────────────┬─────────────────────┬──────────┬──────┬──────────────┤
│ Name     │ Description │ Created             │ Messages │ Size │ Last Message │
├──────────┼─────────────┼─────────────────────┼──────────┼──────┼──────────────┤
│ reliable │             │ 2022-01-18 16:01:31 │ 0        │ 0 B  │ never        │
╰──────────┴─────────────┴─────────────────────┴──────────┴──────┴──────────────╯
k logs -f po/nack-8885cfddc-jkprl -n testing01

I0118 16:16:11.945031       1 main.go:117] Starting /jetstream-controller v...
I0118 16:16:31.239923       1 event.go:285] Event(v1.ObjectReference{Kind:"Stream", Namespace:"testing01", Name:"nats-back-back-reliable-stream", UID:"08c4c36c-9f93-41cf-9a09-b228025eb947", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"151937610", FieldPath:""}): type: 'Normal' reason: 'Creating' Creating stream "reliable"
I0118 16:16:31.270959       1 event.go:285] Event(v1.ObjectReference{Kind:"Stream", Namespace:"testing01", Name:"nats-back-back-reliable-stream", UID:"08c4c36c-9f93-41cf-9a09-b228025eb947", APIVersion:"jetstream.nats.io/v1beta2", ResourceVersion:"151937610", FieldPath:""}): type: 'Normal' reason: 'Created' Created stream "reliable"

Just to confirm, does everything work with natsio/jetstream-controller:0.6.1?

Just to confirm, does everything work with natsio/jetstream-controller:0.6.1?

It doesn't work :-(
I've tried NACK chart version "0.11.2" with "natsio/jetstream-controller:0.6.1".

k get stream -n testing01
NAME                             STATE   STREAM NAME   SUBJECTS
nats-back-back-reliable-stream           reliable      ["scope.*.*"]
nats str list -s localhost:60604
No Streams defined

I also don't see any logs, even though klogLevel: 10 is set in values.yaml for NACK.

k logs po/nack-back-back-9b47cdd69-84p7p -n testing01
I0128 11:04:30.243344       1 main.go:120] Starting /jetstream-controller v0.6.1...
jetstream:
  enabled: true
  image: natsio/jetstream-controller:0.6.1
  klogLevel: 10
  nats:
    url: nats://nats-back-back-internal-lb:4222

namespaced: true

With the latest commit it works

What commit was that?

Also, can you post the output of kubectl describe pod your-nack-pod-abc123?

The commit that works should this one.

That's a description of the NACK pod installed from chart version "0.11.2".

k describe po/nack-back-back-9b47cdd69-dcpgc -n testing01

Name:         nack-back-back-9b47cdd69-dcpgc
Namespace:    testing01
Priority:     0
Node:         worker2/10.0.13.101
Start Time:   Tue, 01 Feb 2022 11:16:10 +0300
Labels:       app=nack-back-back
              chart=nack-0.9.2
              pod-template-hash=9b47cdd69
Annotations:  cni.projectcalico.org/podIP: 10.233.68.231/32
              cni.projectcalico.org/podIPs: 10.233.68.231/32
Status:       Running
IP:           10.233.68.231
IPs:
  IP:           10.233.68.231
Controlled By:  ReplicaSet/nack-back-back-9b47cdd69
Containers:
  jsc:
    Container ID:  docker://6200da25018b3f73c56e292c944469cdd297a2df367c3ad17d9a2905b5efc6e7
    Image:         natsio/jetstream-controller:0.6.1
    Image ID:      docker-pullable://natsio/jetstream-controller@sha256:adb0d6f247d08056ac7e47a589fe96c4a7da0c8626f2e626242f1dba18376362
    Port:          <none>
    Host Port:     <none>
    Command:
      /jetstream-controller
      -s=nats://nats-back-back-internal-lb:4222
      --namespace=default
    State:          Running
      Started:      Tue, 01 Feb 2022 11:16:14 +0300
    Ready:          True
    Restart Count:  0
    Environment:
      POD_NAME:       nack-back-back-9b47cdd69-dcpgc (v1:metadata.name)
      POD_NAMESPACE:  testing01 (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from jetstream-controller-token-s7lbk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  jetstream-controller-token-s7lbk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jetstream-controller-token-s7lbk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  60s   default-scheduler  Successfully assigned testing01/nack-back-back-9b47cdd69-dcpgc to worker2
  Normal  Pulling    59s   kubelet            Pulling image "natsio/jetstream-controller:0.6.1"
  Normal  Pulled     56s   kubelet            Successfully pulled image "natsio/jetstream-controller:0.6.1" in 3.31776091s
  Normal  Created    56s   kubelet            Created container jsc
  Normal  Started    56s   kubelet            Started container jsc

Oook. I think we need this PR: https://github.com/manifest/nack-issue-57/pull/1

And then everything takes a while to startup and get created.

It seems that even with namespaceOverride: testing01 set, the controller still starts with --namespace=default.
Is it something that we should look at?

  Command:
    /jetstream-controller
    -s=nats://nats-back-back-internal-lb:4222
    --namespace=default

That's a full describe.

k describe po/nack-back-back-9b47cdd69-tzdkm -n testing01

Name:         nack-back-back-9b47cdd69-tzdkm
Namespace:    testing01
Priority:     0
Node:         ulms.testing.staging.worker2.infra.ng/10.0.13.101
Start Time:   Tue, 01 Feb 2022 22:29:12 +0300
Labels:       app=nack-back-back
              chart=nack-0.9.2
              pod-template-hash=9b47cdd69
Annotations:  cni.projectcalico.org/podIP: 10.233.68.139/32
              cni.projectcalico.org/podIPs: 10.233.68.139/32
Status:       Running
IP:           10.233.68.139
IPs:
  IP:           10.233.68.139
Controlled By:  ReplicaSet/nack-back-back-9b47cdd69
Containers:
  jsc:
    Container ID:  docker://d8c67adb310c5d5c2b991b9e7a2cad1ff9f8036514fa771f76e33489fbd2cefd
    Image:         natsio/jetstream-controller:0.6.1
    Image ID:      docker-pullable://natsio/jetstream-controller@sha256:adb0d6f247d08056ac7e47a589fe96c4a7da0c8626f2e626242f1dba18376362
    Port:          <none>
    Host Port:     <none>
    Command:
      /jetstream-controller
      -s=nats://nats-back-back-internal-lb:4222
      --namespace=default
    State:          Running
      Started:      Tue, 01 Feb 2022 22:29:13 +0300
    Ready:          True
    Restart Count:  0
    Environment:
      POD_NAME:       nack-back-back-9b47cdd69-tzdkm (v1:metadata.name)
      POD_NAMESPACE:  testing01 (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from jetstream-controller-token-lvjfn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  jetstream-controller-token-lvjfn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jetstream-controller-token-lvjfn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  30s   default-scheduler  Successfully assigned testing01/nack-back-back-9b47cdd69-tzdkm to ulms.testing.staging.worker2.infra.ng
  Normal  Pulled     29s   kubelet            Container image "natsio/jetstream-controller:0.6.1" already present on machine
  Normal  Created    29s   kubelet            Created container jsc
  Normal  Started    29s   kubelet            Started container jsc

For the NACK 0.11.2 config

jetstream:
  enabled: true
  image: natsio/jetstream-controller:0.6.1
  klogLevel: 10
  nats:
    url: nats://nats-back-back-internal-lb:4222

namespaced: true
namespaceOverride: testing01

It seems like you're still running the old version. Here's what my describe pod looks like. Notice I have chart=nack-0.11.2, while you have chart=nack-0.9.2. Can you try using this branch: https://github.com/manifest/nack-issue-57/pull/1 ?

$ kubectl describe pod nack-7854647c5d-zmhq9 -n testing01
Name:         nack-7854647c5d-zmhq9
Namespace:    testing01
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Tue, 01 Feb 2022 11:55:36 -0800
Labels:       app=nack
              chart=nack-0.11.2
              pod-template-hash=7854647c5d
Annotations:  <none>
Status:       Running
IP:           172.17.0.4
IPs:
  IP:           172.17.0.4
Controlled By:  ReplicaSet/nack-7854647c5d
Containers:
  jsc:
    Container ID:  docker://947ded804ad6dda17bdec41841e8cc6a1ab634956639e7a20fb8d711e0c4c990
    Image:         natsio/jetstream-controller:0.6.1
    Image ID:      docker-pullable://natsio/jetstream-controller@sha256:adb0d6f247d08056ac7e47a589fe96c4a7da0c8626f2e626242f1dba18376362
    Port:          <none>
    Host Port:     <none>
    Command:
      /jetstream-controller
      -s=nats://nats-back-back-internal-lb:4222
      --namespace=testing01
    State:          Running
      Started:      Tue, 01 Feb 2022 11:57:15 -0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 01 Feb 2022 11:56:25 -0800
      Finished:     Tue, 01 Feb 2022 11:56:25 -0800
    Ready:          True
    Restart Count:  4
    Environment:
      POD_NAME:       nack-7854647c5d-zmhq9 (v1:metadata.name)
      POD_NAMESPACE:  testing01 (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from jetstream-controller-token-8sgzv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  jetstream-controller-token-8sgzv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jetstream-controller-token-8sgzv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m38s                  default-scheduler  Successfully assigned testing01/nack-7854647c5d-zmhq9 to minikube
  Warning  BackOff    4m12s (x8 over 5m35s)  kubelet            Back-off restarting failed container
  Normal   Pulled     3m59s (x5 over 5m37s)  kubelet            Container image "natsio/jetstream-controller:0.6.1" already present on machine
  Normal   Created    3m59s (x5 over 5m37s)  kubelet            Created container jsc
  Normal   Started    3m59s (x5 over 5m37s)  kubelet            Started container jsc

Yes, I see now. I've missed that Kustomize caches previously installed Helm chart in "charts" directory. After removing the cache directory, a new version of NACK get installed and stream has been created. Thanks.