krkn-chaos / krkn

Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Node scenarios failing on AWS, NoRegionError()

yocum137 opened this issue · comments

[root@lppecput0000241 containers]# docker logs --follow f10995aa0e8f
2022-02-21 16:53:49,980 [INFO] Starting kraken
2022-02-21 16:53:49,996 [INFO] Initializing client to talk to the Kubernetes cluster
2022-02-21 16:53:53,482 [INFO] Publishing kraken status at http://0.0.0.0:8081/
2022-02-21 16:53:53,483 [INFO] Starting http server at http://0.0.0.0:8081/
2022-02-21 16:53:53,483 [INFO] Fetching cluster info
I0221 16:53:54.960909 15 request.go:665] Waited for 1.159585876s due to client-side throttling, not priority and fairness, request: GET:https://api.eng-paas-d-ausw2-1.aws.example.com:6443//apis/security.istio.io/v1beta1?timeout=32s
2022-02-21 16:53:56,703 [INFO]
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.2 True False 37d Cluster version is 4.8.2
Kubernetes control plane is running at https://api.eng-paas-d-ausw2-1.aws.example.com:6443
2022-02-21 16:53:56,703 [INFO] Generated a uuid for the run: fcd78e4d-2444-40c4-a2d0-e90c79ceb08f
2022-02-21 16:53:56,703 [INFO] Daemon mode not enabled, will run through 1 iterations
2022-02-21 16:53:56,703 [INFO] Executing scenarios for iteration 0
2022-02-21 16:53:56,704 [INFO] connection set up
127.0.0.1 - - [21/Feb/2022 16:53:56] "GET / HTTP/1.1" 200 -
2022-02-21 16:53:56,705 [INFO] response RUN
2022-02-21 16:53:56,705 [INFO] Running container scenarios
2022-02-21 16:53:57,850 [INFO] Killing container etcd in pod etcd-ip-10-59-130-251.us-west-2.compute.internal (ns openshift-etcd)
2022-02-21 16:53:58,160 [INFO] Scenario kill etcd container successfully injected
2022-02-21 16:53:58,593 [INFO] Waiting for the specified duration: 60
2022-02-21 16:54:58,654 [INFO]
2022-02-21 16:54:58,654 [INFO] connection set up
127.0.0.1 - - [21/Feb/2022 16:54:58] "GET / HTTP/1.1" 200 -
2022-02-21 16:54:58,656 [INFO] response RUN
2022-02-21 16:54:58,656 [INFO] Running pod scenarios
2022-02-21 16:55:00 INFO main verbosity: None; log level: INFO; handler level: INFO
2022-02-21 16:55:00 INFO main Creating kubernetes client with config /root/.kube/config from --kubeconfig flag
2022-02-21 16:55:00 INFO k8s_client Initializing with config: /root/.kube/config
2022-02-21 16:55:00 INFO main No cloud driver - some functionality disabled
2022-02-21 16:55:00 INFO main Using stdout metrics collector
2022-02-21 16:55:00 INFO main NOT starting the UI server
2022-02-21 16:55:00 INFO main STARTING AUTONOMOUS MODE
2022-02-21 16:55:04 INFO scenario.delete etcd pod Starting scenario 'delete etcd pods' (2 steps)
2022-02-21 16:55:04 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'openshift-etcd', 'selector': 'k8s-app=etcd'}}
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Matched 3 pods for selector k8s-app=etcd in namespace openshift-etcd
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Initial set length: 3
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Filtered set length: 1
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Pod killed: [pod #0 name=etcd-ip-10-59-129-241.us-west-2.compute.internal namespace=openshift-etcd containers=4 ip=10.59.129.241 host_ip=10.59.129.241 state=Running labels:app=etcd,etcd=true,k8s-app=etcd,revision=3 annotations:kubernetes.io/config.hash=2453b138-b846-4a30-a494-f0877af1d16c,kubernetes.io/config.mirror=2453b138-b846-4a30-a494-f0877af1d16c,kubernetes.io/config.seen=2022-02-17T18:52:50.296509696Z,kubernetes.io/config.source=file,target.workload.openshift.io/management={"effect": "PreferredDuringScheduling"}]
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'openshift-etcd', 'selector': 'k8s-app=etcd'}}
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Matched 2 pods for selector k8s-app=etcd in namespace openshift-etcd
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Initial set length: 2
2022-02-21 16:55:05 INFO action_nodes_pods.delete etcd pod Filtered set length: 2
2022-02-21 16:55:05 ERROR action_nodes_pods.delete etcd pod Expected 3 pods, got 2
2022-02-21 16:55:05 WARNING scenario.delete etcd pod Failure in action. Sleeping 30 and retrying
2022-02-21 16:55:35 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'openshift-etcd', 'selector': 'k8s-app=etcd'}}
2022-02-21 16:55:35 INFO action_nodes_pods.delete etcd pod Matched 2 pods for selector k8s-app=etcd in namespace openshift-etcd
2022-02-21 16:55:35 INFO action_nodes_pods.delete etcd pod Initial set length: 2
2022-02-21 16:55:35 INFO action_nodes_pods.delete etcd pod Filtered set length: 2
2022-02-21 16:55:35 ERROR action_nodes_pods.delete etcd pod Expected 3 pods, got 2
2022-02-21 16:55:35 WARNING scenario.delete etcd pod Failure in action. Sleeping 30 and retrying
2022-02-21 16:56:05 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'openshift-etcd', 'selector': 'k8s-app=etcd'}}
2022-02-21 16:56:06 INFO action_nodes_pods.delete etcd pod Matched 2 pods for selector k8s-app=etcd in namespace openshift-etcd
2022-02-21 16:56:06 INFO action_nodes_pods.delete etcd pod Initial set length: 2
2022-02-21 16:56:06 INFO action_nodes_pods.delete etcd pod Filtered set length: 2
2022-02-21 16:56:06 ERROR action_nodes_pods.delete etcd pod Expected 3 pods, got 2
2022-02-21 16:56:06 WARNING scenario.delete etcd pod Failure in action. Sleeping 30 and retrying
2022-02-21 16:56:36 INFO action_nodes_pods.delete etcd pod Matching 'labels' {'labels': {'namespace': 'openshift-etcd', 'selector': 'k8s-app=etcd'}}
2022-02-21 16:56:36 INFO action_nodes_pods.delete etcd pod Matched 3 pods for selector k8s-app=etcd in namespace openshift-etcd
2022-02-21 16:56:36 INFO action_nodes_pods.delete etcd pod Initial set length: 3
2022-02-21 16:56:36 INFO action_nodes_pods.delete etcd pod Filtered set length: 3
2022-02-21 16:56:36 INFO scenario.delete etcd pod Scenario finished
2022-02-21 16:56:36 INFO policy_runner All done here!
2022-02-21 16:56:36,986 [INFO] Scenario: scenarios/etcd.yml has been successfully injected!
2022-02-21 16:56:36,987 [INFO] Waiting for the specified duration: 60
2022-02-21 16:57:54 INFO main verbosity: None; log level: INFO; handler level: INFO
2022-02-21 16:57:54 INFO main Creating kubernetes client with config /root/.kube/config from --kubeconfig flag
2022-02-21 16:57:54 INFO k8s_client Initializing with config: /root/.kube/config
2022-02-21 16:57:54 INFO main No cloud driver - some functionality disabled
2022-02-21 16:57:55 INFO main Using stdout metrics collector
2022-02-21 16:57:55 INFO main NOT starting the UI server
2022-02-21 16:57:55 INFO main STARTING AUTONOMOUS MODE
2022-02-21 16:57:59 INFO scenario.kill up to 3 po Starting scenario 'kill up to 3 pods in any openshift namespace' (1 steps)
2022-02-21 16:57:59 INFO action_nodes_pods.kill up to 3 po Matching 'namespace' {'namespace': 'openshift-.'}
2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Matched 344 pods in namespace openshift-.

2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Initial set length: 344
2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Filtered set length: 3
2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Pod got lucky - not killing
2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Pod got lucky - not killing
2022-02-21 16:58:05 INFO action_nodes_pods.kill up to 3 po Pod killed: [pod #132 name=ingress-canary-7l4xj namespace=openshift-ingress-canary containers=1 ip=192.168.12.6 host_ip=10.59.131.78 state=Running labels:controller-revision-hash=699677dbc,ingresscanary.operator.openshift.io/daemonset-ingresscanary=canary_controller,pod-template-generation=1 annotations:k8s.ovn.org/pod-networks={"default":{"ip_addresses":["[192.168.12.6/25"],"mac_address":"0a:58:c0:a8:0c:06","gateway_ips":["192.168.12.1"],"ip_address":"192.168.12.6/25","gateway_ip":"192.168.12.1"}},k8s.v1.cni.cncf.io/network-status={
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"192.168.12.6"
],
"mac": "0a:58:c0:a8:0c:06",
"default": true,
"dns": {}
}],k8s.v1.cni.cncf.io/networks-status=[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"192.168.12.6"
],
"mac": "0a:58:c0:a8:0c:06",
"default": true,
"dns": {}
}],openshift.io/scc=restricted,workload.openshift.io/warning=only single-node clusters support workload partitioning]
2022-02-21 16:58:05 INFO scenario.kill up to 3 po Scenario finished
2022-02-21 16:58:05 INFO policy_runner All done here!
2022-02-21 16:58:05,664 [INFO] Scenario: scenarios/regex_openshift_pod_kill.yml has been successfully injected!
2022-02-21 16:58:05,665 [INFO] Waiting for the specified duration: 60
2022-02-21 16:59:22,216 [INFO] scenarios/post_action_regex.py post action checks passed
2022-02-21 16:59:22,217 [INFO] connection set up
127.0.0.1 - - [21/Feb/2022 16:59:22] "GET / HTTP/1.1" 200 -
2022-02-21 16:59:22,218 [INFO] response RUN
2022-02-21 16:59:22,219 [INFO] Running node scenarios
_ _
| | ___ __ __ | | _____ _ __
| |/ / '__/ ` | |/ / _ \ '
| <| | | (
| | < / | | |
||__| _,||__
|| ||

Traceback (most recent call last):
File "run_kraken.py", line 294, in
main(options.cfg)
File "run_kraken.py", line 172, in main
nodeaction.run(scenarios_list, config, wait_duration)
File "/root/kraken/kraken/node_actions/run.py", line 51, in run
node_scenario_object = get_node_scenario_object(node_scenario)
File "/root/kraken/kraken/node_actions/run.py", line 25, in get_node_scenario_object
return aws_node_scenarios()
File "/root/kraken/kraken/node_actions/aws_node_scenarios.py", line 156, in init
self.aws = AWS()
File "/root/kraken/kraken/node_actions/aws_node_scenarios.py", line 12, in init
self.boto_client = boto3.client("ec2")
File "/usr/local/lib/python3.6/site-packages/boto3/init.py", line 93, in client
return _get_default_session().client(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/boto3/session.py", line 275, in client
aws_session_token=aws_session_token, config=config)
File "/usr/local/lib/python3.6/site-packages/botocore/session.py", line 874, in create_client
client_config=config, api_version=api_version)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 93, in create_client
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 362, in _get_client_args
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/usr/local/lib/python3.6/site-packages/botocore/args.py", line 73, in get_client_args
endpoint_url, is_secure, scoped_config)
File "/usr/local/lib/python3.6/site-packages/botocore/args.py", line 154, in compute_client_args
s3_config=s3_config,
File "/usr/local/lib/python3.6/site-packages/botocore/args.py", line 234, in _compute_endpoint_config
return self._resolve_endpoint(**resolve_endpoint_kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/args.py", line 321, in _resolve_endpoint
service_name, region_name, endpoint_url, is_secure)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 444, in resolve
use_fips_endpoint=use_fips_endpoint,
File "/usr/local/lib/python3.6/site-packages/botocore/regions.py", line 183, in construct_endpoint
use_fips_endpoint
File "/usr/local/lib/python3.6/site-packages/botocore/regions.py", line 215, in _endpoint_for_partition
raise NoRegionError()

Thanks for the issue, can you please verify that you have properly configured your AWS cli using the following doc: https://github.com/cloud-bulldozer/kraken/blob/master/docs/node_scenarios.md#aws

If you are using kraken-hub/containerized way of running, be sure to set up the following environment variables and specify them properly on your docker run line:
https://github.com/cloud-bulldozer/kraken-hub/blob/main/docs/node-scenarios.md

Closing the issue as @yocum137 confirmed that the scenario worked after setting the region to target. Please feel free to reopen in case of any issues. Thanks.