Fauxton shows “This database failed to load” after pod restarts
DB185344 opened this issue · comments
Describe the bug
After restarting a pod, the node fails to join the cluster properly, and we're getting an error on Fauxton, that displays 'this database failed to load' on some databases. when refreshing the browser, a different db comes online and a different db displays 'this database failed to load'. only after running a curl request with 'finish_cluster' the error stops.
Version of Helm and Kubernetes: Helm: 3.5.4, Kubernetes: 1.19
What happened: After restarting a pod, the node fails to join the cluster properly, and only after running:
curl -X POST http://$adminUser:$adminPassword@<couchdb_pod>:5984/_cluster_setup -H "Accept: application/json" -H "Content-Type: application/json" -d '{"action": "finish_cluster"}'
The pod will join back to the cluster.
What you expected to happen: After restart of the pod, the node automatically joins the cluster.
How to reproduce it (as minimally and precisely as possible): restart 1 pod in the cluster.
Anything else we need to know:
Adding image from Fauxton regarding this database failed to load:
Also added the values.yaml:
clusterSize: 3
allowAdminParty: false
createAdminSecret: false
adminUsername: admin
networkPolicy:
enabled: true
serviceAccount:
enabled: true
create: true
persistentVolume:
enabled: true
accessModes:
- ReadWriteOnce
size: 10Gi
storageClass: "ssd-couchdb"
image:
repository:
tag: latest
pullPolicy: Always
searchImage:
repository: kocolosk/couchdb-search
tag: 0.2.0
pullPolicy: IfNotPresent
enableSearch: false
initImage:
repository: busybox
tag: latest
pullPolicy: Always
podManagementPolicy: Parallel
affinity: {}
annotations: {}
tolerations: []
service:
annotations:
enabled: true
type: LoadBalancer
externalPort: 5984
sidecarsPort: 8080
LoadBalancerIP:
ingress:
enabled: false
hosts:
- chart-example.local
path: /
annotations: []
tls:
resources:
{}
erlangFlags:
name: couchdb
setcookie: monster
couchdbConfig:
chttpd:
bind_address: any
require_valid_user: false
dns:
clusterDomainSuffix: cluster.local
livenessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 0
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 0
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
sidecars:
image: "<sidecar_image>"
imagePullPolicy: Always
Did you ever find a fix for the pod not rejoining the cluster properly? I'm encountering that now.
Hi @willholley. It might be #7, but it doesn't happen on pod restart. It only happens when there's a new pod after a helmd upgrade
. It seems to be that whenever the helm chart is run, it generates new credentials. (I noticed that the auto-generated admin password changes every time I install I update the helm deployment.) New pods pick up the new credential, but old ones don't. So the workaround I found was to kill all the existing pods after scaling. (Obviously not ideal, but I don't have to do that very often.)
Perhaps #89 will fix it?
Alternatively, I could just define my own admin credentials manually and not have a problem anymore.