CI Failure (pod not found) in `RollingRestartTest.test_rolling_restart`
vbotbuildovich opened this issue · comments
https://buildkite.com/redpanda/vtools/builds/14516
Module: rptest.redpanda_cloud_tests.rolling_restart_test
Class: RollingRestartTest
Method: test_rolling_restart
test_id: RollingRestartTest.test_rolling_restart
status: FAIL
run time: 273.274 seconds
CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"], '', 'Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found\n\x1b[31mERROR: \x1b[0mProcess exited with status 1\n\n')
Traceback (most recent call last):
File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
data = self.run_test()
File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
return self.test_context.function(self.test)
File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
r = f(self, *args, **kwargs)
File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/rolling_restart_test.py", line 35, in test_rolling_restart
self.redpanda.rolling_restart_pods()
File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1863, in rolling_restart_pods
self.restart_pod(pod_name, pod_timeout)
File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1839, in restart_pod
wait_until(lambda: pod_container_ready(pod_name) == 'true',
File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 53, in wait_until
raise e
File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 44, in wait_until
if condition():
File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1839, in <lambda>
wait_until(lambda: pod_container_ready(pod_name) == 'true',
File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1825, in pod_container_ready
return self.kubectl.cmd([
File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 259, in cmd
return self._ssh_cmd(cmd, capture=capture)
File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 235, in _ssh_cmd
return self._local_cmd(local_cmd)
File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 215, in _local_cmd
raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"]' returned non-zero exit status 1.
JIRA Link: CORE-4149
Error output:
CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"], '', 'Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found\n\x1b[31mERROR: \x1b[0mProcess exited with status 1\n\n')
Important bit is: Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found
, i.e., pod-1 does not exist: maybe this is expected as we are in the middle of deleting some pods? So perhaps the pod is not there for a bit and this is expected, we just need to wait a bit more?
Setting RCA to test as I think that's the most likely at this point.