redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

Home Page:https://redpanda.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CI Failure (pod not found) in `RollingRestartTest.test_rolling_restart`

vbotbuildovich opened this issue · comments

https://buildkite.com/redpanda/vtools/builds/14516

Module: rptest.redpanda_cloud_tests.rolling_restart_test
Class: RollingRestartTest
Method: test_rolling_restart
test_id:    RollingRestartTest.test_rolling_restart
status:     FAIL
run time:   273.274 seconds

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"], '', 'Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found\n\x1b[31mERROR: \x1b[0mProcess exited with status 1\n\n')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/rolling_restart_test.py", line 35, in test_rolling_restart
    self.redpanda.rolling_restart_pods()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1863, in rolling_restart_pods
    self.restart_pod(pod_name, pod_timeout)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1839, in restart_pod
    wait_until(lambda: pod_container_ready(pod_name) == 'true',
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1839, in <lambda>
    wait_until(lambda: pod_container_ready(pod_name) == 'true',
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1825, in pod_container_ready
    return self.kubectl.cmd([
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 259, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 235, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 215, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"]' returned non-zero exit status 1.

JIRA Link: CORE-4149

Error output:

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cph6m7n0qhlllpgc8450-agent', 'kubectl', 'get', 'pod', 'rp-cph6m7n0qhlllpgc8450-1', '-n=redpanda', "-o=jsonpath='{.status.containerStatuses[0].ready}'"], '', 'Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found\n\x1b[31mERROR: \x1b[0mProcess exited with status 1\n\n')

Important bit is: Error from server (NotFound): pods "rp-cph6m7n0qhlllpgc8450-1" not found, i.e., pod-1 does not exist: maybe this is expected as we are in the middle of deleting some pods? So perhaps the pod is not there for a bit and this is expected, we just need to wait a bit more?

Setting RCA to test as I think that's the most likely at this point.