Fauxton shows “This database failed to load” after pod restarts

Question

Fauxton shows “This database failed to load” after pod restarts

DB185344 opened this issue 3 years ago · comments

Describe the bug

After restarting a pod, the node fails to join the cluster properly, and we're getting an error on Fauxton, that displays 'this database failed to load' on some databases. when refreshing the browser, a different db comes online and a different db displays 'this database failed to load'. only after running a curl request with 'finish_cluster' the error stops.

Version of Helm and Kubernetes: Helm: 3.5.4, Kubernetes: 1.19

What happened: After restarting a pod, the node fails to join the cluster properly, and only after running:

curl -X POST http://$adminUser:$adminPassword@<couchdb_pod>:5984/_cluster_setup -H "Accept: application/json" -H "Content-Type: application/json" -d '{"action": "finish_cluster"}'
The pod will join back to the cluster.

What you expected to happen: After restart of the pod, the node automatically joins the cluster.

How to reproduce it (as minimally and precisely as possible): restart 1 pod in the cluster.

Anything else we need to know:

Adding image from Fauxton regarding this database failed to load:

Also added the values.yaml:

clusterSize: 3

allowAdminParty: false

createAdminSecret: false

adminUsername: admin
networkPolicy:
enabled: true

serviceAccount:
enabled: true
create: true
persistentVolume:
enabled: true
accessModes:
- ReadWriteOnce
size: 10Gi
storageClass: "ssd-couchdb"

image:
repository:
tag: latest
pullPolicy: Always

searchImage:
repository: kocolosk/couchdb-search
tag: 0.2.0
pullPolicy: IfNotPresent

enableSearch: false

initImage:
repository: busybox
tag: latest
pullPolicy: Always

podManagementPolicy: Parallel

affinity: {}

annotations: {}

tolerations: []

service:

annotations:

enabled: true
type: LoadBalancer
externalPort: 5984
sidecarsPort: 8080
LoadBalancerIP:

ingress:
enabled: false
hosts:
- chart-example.local
path: /
annotations: []
tls:
resources:
{}

erlangFlags:
name: couchdb
setcookie: monster

couchdbConfig:
chttpd:
bind_address: any
require_valid_user: false

dns:
clusterDomainSuffix: cluster.local
livenessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 0
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 0
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1

sidecars:
image: "<sidecar_image>"
imagePullPolicy: Always

James Tanner-McLeod · Answer 1 · Tue Sep 27 2022 05:53:22 GMT+0800 (China Standard Time)

Did you ever find a fix for the pod not rejoining the cluster properly? I'm encountering that now.

Will Holley · Answer 2 · Wed Sep 28 2022 16:47:48 GMT+0800 (China Standard Time)

@jftanner can you share the logs from the pod that isn't joined? If the admin hash is not specified in the helm chart then you may be encountering #7.

James Tanner-McLeod · Answer 3 · Wed Sep 28 2022 22:33:08 GMT+0800 (China Standard Time)

Hi @willholley. It might be #7, but it doesn't happen on pod restart. It only happens when there's a new pod after a helmd upgrade. It seems to be that whenever the helm chart is run, it generates new credentials. (I noticed that the auto-generated admin password changes every time I install I update the helm deployment.) New pods pick up the new credential, but old ones don't. So the workaround I found was to kill all the existing pods after scaling. (Obviously not ideal, but I don't have to do that very often.)

Perhaps #89 will fix it?

Alternatively, I could just define my own admin credentials manually and not have a problem anymore.

Cole Arendt · Answer 4 · Fri Sep 30 2022 04:38:54 GMT+0800 (China Standard Time)

Yes, this sounds just like #78 , and #89 would likely fix / is intended to fix 😄