Receiving "Deployment is not ready" error while the deployment is ready actually
MurzNN opened this issue Β· comments
Describe the bug
When I start the pv-migrate, it creates the deployment, but in the debug log I see errors like:
π Attempting strategy: lbsvc
π Generating SSH key pair
creating 4 resource(s)
beginning wait for 4 resources with timeout of 1m0s
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
But at the same time, via kubectl I see that the deployment is ready:
$ kubectl -n korepov get deployment pv-migrate-dbabc-src-sshd
NAME READY UP-TO-DATE AVAILABLE AGE
pv-migrate-dbabc-src-sshd 1/1 1 1 43s
The log level is debug, and no additional messages were displayed.
So, any ideas on what can cause this problem?
How can I enable more verbose logging to understand what's happening and why it is not detecting the ready status?
Console output
π Attempting strategy: lbsvc
π Generating SSH key pair
creating 4 resource(s)
beginning wait for 4 resources with timeout of 1m0s
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
π§Ή Cleaning up
uninstall: Deleting pv-migrate-dbabc-src
uninstall: given cascade value: , defaulting to delete propagation background
Starting delete for "pv-migrate-dbabc-src-sshd" Service
Starting delete for "pv-migrate-dbabc-src-sshd" Deployment
Starting delete for "pv-migrate-dbabc-src-sshd" Secret
Starting delete for "pv-migrate-dbabc-src-sshd" ServiceAccount
beginning wait for 4 resources to be deleted with timeout of 1m0s
purge requested for pv-migrate-dbabc-src
β¨ Cleanup done
πΆ Migration failed with this strategy, will try with the remaining strategies
Error: migration failed: all strategies failed for this migration
**Version**
- Source and destination Kubernetes versions: source - `v1.25.6`, destination - ` v1.27.7`
- Source and destination container runtimes: source - `containerd://1.6.15`, destination - `containerd://1.7.5`
- pv-migrate version 1.7.1 (commit: 1affa11b175d20969b9d6f2879c09dc94f0b4a0f) (build date: 2023-10-09T21:56:55Z)
- Installation method: krew
- Source and destination PVC type, size and accessModes: `ReadWriteMany, csi-cephfs-sc, 2G -> ReadWriteMany,
local-path, 2G`
And here is the output of the all resources, related to the process, while I see the "Deployment is not ready" error:
$ kubectl -n korepov get all | grep pv-migrate
pod/pv-migrate-dbddb-src-sshd-cf79c787-d2nph 1/1 Running 0 18s
service/pv-migrate-dbddb-src-sshd NodePort 10.233.18.8 <none> 22:32148/TCP 20s
deployment.apps/pv-migrate-dbddb-src-sshd 1/1 1 1 19s
replicaset.apps/pv-migrate-dbddb-src-sshd-cf79c787 1 1 1 19s
This looks like a bug, I'll have a look. You can get more info by --log-level=debug --log-format=json
, but not sure if it's gonna help here.
Thanks! I already have --log-level=debug
and --log-format=json
just adds more garbage to the output, but not new useful information ;)
Maybe you can explain how to debug this on my side? And I will share more debugging information for you.
I had a look and noticed that this error comes from Helm's wait logic, not from our code. So I would give a try to pass --skip-cleanup
and try to troubleshoot it using helm
cli, trying to find out why it does not report as ready.
You can give a try to
helm ls -a
helm status <name-of-the-release>
Also, note that for lbsvc
, Helm would wait for the created Service
to actually get an external IP (not pending). This could be the problem.
Tested, even without --skip-cleanup
- it shows as deployed, while in the terminal I see coming lines:
Deployment is not ready: korepov/pv-migrate-dcada-src-sshd. 0 out of 1 expected pods are ready
Here is the output of helm:
$ helm status pv-migrate-dcada-src
NAME: pv-migrate-dcada-src
LAST DEPLOYED: Wed Dec 13 15:12:42 2023
NAMESPACE: korepov
STATUS: deployed
REVISION: 1
TEST SUITE: None
Seems this problem is related to the NodePort
service type mode. I can't test it with LoadBalancer
type because no free IPs are available for it on the source cluster.
But I tested on the destination cluster (just test the copy back), and with LoadBalancer
it works well, but with NodePort
I'm receiving the same error.
While the pv-migrate waits for readiness, I see the Service in the active state, here are the details:
$ kubectl describe service pv-migrate-bdaea-src-sshd
Name: pv-migrate-bdaea-src-sshd
Namespace: korepov-pro-dev
Labels: app.kubernetes.io/component=sshd
app.kubernetes.io/instance=pv-migrate-bdaea-src
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=pv-migrate
app.kubernetes.io/version=0.5.0
helm.sh/chart=pv-migrate-0.5.0
Annotations: meta.helm.sh/release-name: pv-migrate-bdaea-src
meta.helm.sh/release-namespace: korepov-pro-dev
Selector: app.kubernetes.io/component=sshd,app.kubernetes.io/instance=pv-migrate-bdaea-src,app.kubernetes.io/name=pv-migrate
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.233.53.90
IPs: 10.233.53.90
Port: ssh 22/TCP
TargetPort: 22/TCP
NodePort: ssh 31784/TCP
Endpoints: 10.233.74.26:22
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
And I can connect to this node port on the source cluster from the destination cluster (using the externtal IP of any node) via telnet:
# telnet 1.2.3.4 31784
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
SSH-2.0-OpenSSH_9.3
So, the network connection is not a problem.
So, could you please describe what exactly it tries to wait? And maybe make the more verbose debug logging to cath it?
Also, specifying the source node IP address explicitly using --dest-host-override 1.2.3.4
doesn't help too.
And will be good to add to the debug logs the output of the Helm chart deployment status, at least helm status
, but better - also the pod and service status.