Azure / KAN

KubeAI Application Nucleus for edge is a solution accelerator for creating, deploying, and operating environment-aware solutions at scale that use artificial intelligence (AI) at the edge with the control and flexibility of open-source natively on your environment.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

managermodule fails to start when target-runtime-symphony-agent fails to access SYMPHONY_URL

BaoxiJia opened this issue · comments

I created a P4E deployment on a VM which was on a different network from P4E API. The managermodule edge module fails to start because P4E agent running on the VM couldn't access the P4E API in K8S.

Expected result: managermodule edge module outputs the error information and keeps running instead of stopping to run.

image
My VM IP address is 192.168.66.102, and it cannot access 192.168.0.4.

root@apd-00155d5a0d06 [ ~ ]# iotedge logs instance-d96af6dc-219b-44c5-939f-17c21b8ece44-managermodule

["skill-39657d9c-a1ae-46e9-b8f0-64aad53c13e1 as skill-1d63"]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/usr/local/lib/python3.10/site-packages/httpcore/backends/sync.py", line 26, in read
    return self._sock.recv(max_bytes)
TimeoutError: timed out
	 
During handling of the above exception, another exception occurred:
	
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 218, in handle_request
    resp = self._pool.handle_request(req)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 253, in handle_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 237, in handle_request
    response = connection.handle_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 90, in handle_request
    return self._connection.handle_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 105, in handle_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 84, in handle_request
    ) = self._receive_response_headers(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 148, in _receive_response_headers
    event = self._receive_event(timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 177, in _receive_event
    data = self._network_stream.read(
  File "/usr/local/lib/python3.10/site-packages/httpcore/backends/sync.py", line 24, in read
    with map_exceptions(exc_map):
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_exceptions.py", line 12, in map_exceptions
    raise to_exc(exc)
httpcore.ReadTimeout: timed out

 
The above exception was the direct cause of the following exception:
	 

Traceback (most recent call last):
  File "/app/main.py", line 181, in <module>
    instance = client.get_instance(instance_name)
  File "/app/common/symphony_agent_client.py", line 60, in get_instance
    return self._get('Instance', name)
  File "/app/common/symphony_agent_client.py", line 45, in _get
    r = httpx.get(self.url, params=params)
  File "/usr/local/lib/python3.10/site-packages/httpx/_api.py", line 189, in get
    return request(
  File "/usr/local/lib/python3.10/site-packages/httpx/_api.py", line 100, in request
    return client.request(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 815, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 902, in send
    response = self._send_handling_auth(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 930, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 967, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1003, in _send_single_request
    response = transport.handle_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 217, in handle_request
    with map_httpcore_exceptions():
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout: timed out
["skill-39657d9c-a1ae-46e9-b8f0-64aad53c13e1 as skill-1d63"]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/usr/local/lib/python3.10/site-packages/httpcore/backends/sync.py", line 26, in read
    return self._sock.recv(max_bytes)
TimeoutError: timed out

Managermodule should not fail to start when our agent loses communication with control plane. It should emit proper log that our agent can't connect to control plane instead. @Haishi2016 Our agent should also emit logs that it can't connect to the control plane

It is resolved