Unble to get the Pods running

Question

Unble to get the Pods running

kartraj opened this issue 9 months ago · comments

Karthik Rajashekaran commented 9 months ago

we are trying to bring up this setting on AWS VM. please advise!

After we install the Kubs cluters we see that the Pods are not coming up properly.
ubuntu@ip-172-31-92-122:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
etcd-operator-etcd-operator-etcd-operator-5c8d485cfb-k76d2 1/1 Running 0 76s
initial-config-nginx-ingress-controller-x2j4f 0/1 Pending 0 84s
initial-config-nginx-ingress-default-backend-5df46cc4d9-dwjd7 1/1 Running 0 84s
ngingress-nginx-ingress-controller-5rd5t 1/1 Running 0 104s
ngingress-nginx-ingress-default-backend-7cf5b8d7f4-jktgn 1/1 Running 0 104s

when we try to run the cluster setting we get the below error.
please confirm if the deployment name and version is acceptable
runkube.py --with-elk -o overlay/user_template.yml -- dzopscale 3.4.3-0

ubuntu@ip-172-31-92-122:~/optscale/optscale-deploy$ python3 runkube.py --with-elk -o overlay/user_template.yml -- dzopscale 3.4.3-0
18:19:48.680: Pulling images for 172.31.92.122
18:19:48.694: Pulling image index.docker.io/hystax/arcee with tag 3.4.3-0
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://172.31.92.122:2376/v1.40/images/create?tag=3.4.3-0&fromImage=index.docker.io%2Fhystax%2Farcee

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "runkube.py", line 430, in
acr.start(args.check, args.update_only)
File "runkube.py", line 307, in start
self.pull_images(node)
File "runkube.py", line 153, in pull_images
self._pull_image(
File "runkube.py", line 128, in _pull_image
image = docker_cl.images.pull(**params)
File "/usr/local/lib/python3.8/dist-packages/docker/models/images.py", line 465, in pull
pull_log = self.client.api.pull(
File "/usr/local/lib/python3.8/dist-packages/docker/api/image.py", line 429, in pull
self._raise_for_status(response)
File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e) from e
File "/usr/local/lib/python3.8/dist-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation) from e
docker.errors.NotFound: 404 Client Error for http://172.31.92.122:2376/v1.40/images/create?tag=3.4.3-0&fromImage=index.docker.io%2Fhystax%2Farcee: Not Found ("manifest for hystax/arcee:3.4.3-0 not found: manifest unknown: manifest unknown")

Karthik Rajashekaran · Answer 1 · Thu Sep 14 2023 02:28:19 GMT+0800 (China Standard Time)

ubuntu@ip-172-31-92-122:~/optscale/optscale-deploy$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
etcd-restore-operator ClusterIP 10.99.125.21 19999/TCP 13m
initial-config-nginx-ingress-controller LoadBalancer 10.106.221.133 443:19048/TCP 13m
initial-config-nginx-ingress-default-backend ClusterIP 10.96.181.22 80/TCP 13m
kubernetes ClusterIP 10.96.0.1 443/TCP 14m
ngingress-nginx-ingress-controller LoadBalancer 10.106.75.177 80:26445/TCP,443:27427/TCP 13m
ngingress-nginx-ingress-default-backend ClusterIP 10.105.201.132 80/TCP 13m

i think the pod is having port conflict. Please help us identify issue

Vladislav A · Answer 2 · Thu Sep 14 2023 14:46:25 GMT+0800 (China Standard Time)

please use https://github.com/hystax/optscale/releases/tag/2023091300-public-pre-release for deployment

Karthik Rajashekaran · Answer 3 · Thu Sep 14 2023 20:09:46 GMT+0800 (China Standard Time)

when we use the 2023091300-public-pre-release we still get the error manifest not found,

the 2023091300-public tag worked and the images are getting downloaded. But the process fails with the permission issue on /tmp?

Can you please guide?

maxb-hystax · Answer 4 · Thu Sep 14 2023 20:50:51 GMT+0800 (China Standard Time)

Please checkout https://github.com/hystax/optscale/releases/tag/2023091300-public-pre-release and use deployment from that commit.

Karthik Rajashekaran · Answer 5 · Thu Sep 14 2023 20:53:20 GMT+0800 (China Standard Time)

When we check out from the public -pre-release it fails to run the runkube.py
(.venv) ubuntu@ip-172-31-32-229:~/optscale-new/optscale-2023091300-public-pre-release/optscale-deploy$ ./runkube.py --with-elk -o overlay/user_template.yml -- dzoptscale 2023091300-public-pre-release
08:10:17.983: Pulling images for 172.31.32.229
Traceback (most recent call last):
File "./runkube.py", line 402, in
acr.start(args.check, args.update_only)
File "./runkube.py", line 284, in start
self.pull_images(node)
File "./runkube.py", line 132, in pull_images
for image, tag in self.versions_info['images'].items():
File "./runkube.py", line 108, in versions_info
with open(self.component_versions) as f_ver:
FileNotFoundError: [Errno 2] No such file or directory: '2023091300-public-pre-release'

The same with the 2023091300-public also.

So, I went to the integration branch and ran the Runkube.py and release 2023091300-public and was able to get the images going, but when we tried to connect it said

How do we access the log files of the Ngnix pod?

nk-hystax · Answer 6 · Thu Sep 14 2023 22:12:50 GMT+0800 (China Standard Time)

@kartraj have you set the value for react_app_google_oauth_client_id in overlay/user_template.yml?

Karthik Rajashekaran · Answer 7 · Fri Sep 15 2023 22:53:23 GMT+0800 (China Standard Time)

@nk-hystax thank you for that nudge. it really helped. but now I think we hit another error while connecting our Cloud accounts. any suggestions on this?

we have AWS accounts giving the currency in $ while Azure and GCP in INR.
do we need a separate org for each cloud? what is your advice?

@maxb-hystax @nexusriot

maxb-hystax · Answer 8 · Mon Sep 18 2023 13:53:12 GMT+0800 (China Standard Time)

Hi @kartraj!

At the moment, you need to create separate organization for each currency you use. Organization's currency can be changed before connection of the first cloud account.

kiranRadhh · Answer 9 · Thu Sep 28 2023 21:36:29 GMT+0800 (China Standard Time)

Hi @maxb-hystax

We are still facing an issue with google oauth login. Please find the ss below.
What should the redirect URL set to in the google oauth settings ?