Azure / iotedge

The IoT Edge OSS project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

edgeAgent automatically stopping all modules

JelleCaptic opened this issue · comments

Expected Behavior

I would assume modules can stay up without a certain time limit or auto-shutdown mechanism. On top of that, I would expect certain reasoning for automatically stopping modules.

Current Behavior

At a random timestamp the edgeAgent stopped all modules and executed a stop command for all my running modules

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  • Up until now I was unable to re-produce the issue. I have other edge devices that have modules running for over 1 month without any issues.

Context (Environment)

Output of iotedge check

All iotedge checks are ok.

Device Information

  • Host OS [Ubuntu 20.04.5 LTS]:
  • Architecture [aarch64]:
  • Container OS [Linux containers]:

Runtime Versions

  • aziot-edged [1.4.20]:
  • Edge Agent [1.4.29]:
  • Edge Hub [1.4.29]:
  • Docker/Moby :
Client:
 Version:           23.0.7+azure-1
 API version:       1.42
 Go version:        go1.19.12
 Git commit:        92955ed461e91154ed722ed831e9f22ea93fd7e9
 Built:             Wed Aug 23 12:04:52 UTC 2023
 OS/Arch:           linux/arm64
 Context:           default

Server:
 Engine:
  Version:          23.0.7+azure-1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.12
  Git commit:       606a7a76949ec08cb26940037b1c71c0024c6b44
  Built:            Wed Sep  6 06:38:37 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.24-2
  GitCommit:        61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc:
  Version:          1.1.9-1
  GitCommit:        ccaecfcbc907d70a7aa870a6650887b901b25b82
 docker-init:
  Version:          0.19.0
  GitCommit:

Logs

edge-agent logs

{"log":"\u003c6\u003e 2024-03-04 07:09:38.280 +00:00 [INF] - Starting periodic operation refresh twin config...\n","stream":"stdout","time":"2024-03-04T07:09:38.280722368Z"}
{"log":"\u003c6\u003e 2024-03-04 07:09:38.303 +00:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 108 and reported properties version 871.\n","stream":"stdout","time":"2024-03-04T07:09:38.303908896Z"}
{"log":"\u003c6\u003e 2024-03-04 07:09:38.314 +00:00 [INF] - Successfully completed periodic operation refresh twin config\n","stream":"stdout","time":"2024-03-04T07:09:38.31436288Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.122 +00:00 [INF] - Termination requested, initiating shutdown.\n","stream":"stdout","time":"2024-03-04T07:13:25.123151872Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.123 +00:00 [INF] - Main thread terminated\n","stream":"stdout","time":"2024-03-04T07:13:25.123461504Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.131 +00:00 [INF] - Initiating shutdown cleanup.\n","stream":"stdout","time":"2024-03-04T07:13:25.141976576Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.133 +00:00 [INF] - Stopping all modules...\n","stream":"stdout","time":"2024-03-04T07:13:25.142001824Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.168 +00:00 [INF] - Waiting for cleanup to finish\n","stream":"stdout","time":"2024-03-04T07:13:25.1692384Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.270 +00:00 [INF] - Plan execution started for deployment -1\n","stream":"stdout","time":"2024-03-04T07:13:25.273090144Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.290 +00:00 [INF] - Executing command: \"Stop module db\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291056896Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-data-manager\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291107968Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module api\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291170176Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-api-frontend\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291252768Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-frontend\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291321536Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.302 +00:00 [INF] - Updated reported properties\n","stream":"stdout","time":"2024-03-04T07:13:25.302620832Z"}
{"log":"2024-03-04 07:13:40  Starting Edge Agent\n","stream":"stdout","time":"2024-03-04T07:13:40.783627552Z"}
{"log":"2024-03-04 07:13:40  Changing ownership of storage folder: /tmp/edgeAgent to 13622\n","stream":"stdout","time":"2024-03-04T07:13:40.811136352Z"}
{"log":"2024-03-04 07:13:40  Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622\n","stream":"stdout","time":"2024-03-04T07:13:40.817397472Z"}
{"log":"2024-03-04 07:13:40  Changing ownership of management socket: /var/run/iotedge/mgmt.sock\n","stream":"stdout","time":"2024-03-04T07:13:40.831135328Z"}
{"log":"2024-03-04 07:13:40  Completed necessary setup. Starting Edge Agent.\n","stream":"stdout","time":"2024-03-04T07:13:40.851940512Z"}
{"log":"2024-03-04 07:13:40.896 +00:00 Edge Agent Main()\n","stream":"stdout","time":"2024-03-04T07:13:40.907348672Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:41.206 +00:00 [INF] - Initializing Edge Agent.\n","stream":"stdout","time":"2024-03-04T07:13:41.232074208Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:41.417 +00:00 [INF] - Version - 1.4.29.85541525 (f049017a5072f85aec250e945ebb2ca52e4af3f4)\n","stream":"stdout","time":"2024-03-04T07:13:41.417553152Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:41.418 +00:00 [INF] - \n","stream":"stdout","time":"2024-03-04T07:13:41.418499744Z"}

edge-hub logs

{"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" received FIN.\n","stream":"stdout","time":"2024-03-04T07:13:23.775449856Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" disconnecting.\n","stream":"stdout","time":"2024-03-04T07:13:23.775573856Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" stopped.\n","stream":"stdout","time":"2024-03-04T07:13:23.775600288Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" sending FIN because: \"\"The Socket transport's send loop completed gracefully.\"\"\n","stream":"stdout","time":"2024-03-04T07:13:23.775730112Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.127 +00:00 [INF] [EdgeHub] - Termination requested, initiating shutdown.\n","stream":"stdout","time":"2024-03-04T07:13:25.129527008Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.128 +00:00 [INF] [EdgeHub] - Stopping the protocol heads...\n","stream":"stdout","time":"2024-03-04T07:13:25.12964Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.130 +00:00 [INF] [EdgeHub] - Closing protocol heads - (MQTT, HTTP)\n","stream":"stdout","time":"2024-03-04T07:13:25.132696384Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.136 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MqttProtocolHead] - Stopping MQTT protocol head\n","stream":"stdout","time":"2024-03-04T07:13:25.137041024Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.141 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Http.HttpProtocolHead] - Closing HTTP head\n","stream":"stdout","time":"2024-03-04T07:13:25.14174832Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:25.143 +00:00 [DBG] [Microsoft.AspNetCore.Hosting.Diagnostics] - Hosting shutdown\n","stream":"stdout","time":"2024-03-04T07:13:25.1432352Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.177 +00:00 [INF] [EdgeHub] - Waiting for cleanup to finish\n","stream":"stdout","time":"2024-03-04T07:13:25.178299328Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:25.256 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" received FIN.\n","stream":"stdout","time":"2024-03-04T07:13:25.257041696Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:25.259 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" sending FIN because: \"\"The Socket transport's send loop completed gracefully.\"\"\n","stream":"stdout","time":"2024-03-04T07:13:25.259576512Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:25.259 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" disconnecting.\n","stream":"stdout","time":"2024-03-04T07:13:25.260128576Z"}
{"log":"\u003c4\u003e 2024-03-04 07:13:25.260 +00:00 [WRN] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel] - Error reading transport. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.264976352Z"}
{"log":"System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.\n","stream":"stdout","time":"2024-03-04T07:13:25.265037504Z"}
{"log":"   at System.Net.WebSockets.ManagedWebSocket.ThrowIfEOFUnexpected(Boolean throwOnPrematureClosure)\n","stream":"stdout","time":"2024-03-04T07:13:25.265044064Z"}
{"log":"   at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure)\n","stream":"stdout","time":"2024-03-04T07:13:25.265048352Z"}
{"log":"   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)\n","stream":"stdout","time":"2024-03-04T07:13:25.265078624Z"}
{"log":"   at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TResult](Memory`1 payloadBuffer, CancellationToken cancellationToken)\n","stream":"stdout","time":"2024-03-04T07:13:25.265084608Z"}
{"log":"   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource\u003cTResult\u003e.GetResult(Int16 token)\n","stream":"stdout","time":"2024-03-04T07:13:25.265088352Z"}
{"log":"   at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.\u003c\u003ec.\u003c.cctor\u003eb__4_0(Object state)\n","stream":"stdout","time":"2024-03-04T07:13:25.265092256Z"}
{"log":"--- End of stack trace from previous location ---\n","stream":"stdout","time":"2024-03-04T07:13:25.265095712Z"}
{"log":"   at Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel.DoReadBytes(IByteBuffer byteBuffer) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Mqtt/ServerWebSocketChannel.cs:line 239\n","stream":"stdout","time":"2024-03-04T07:13:25.26509904Z"}
{"log":"   at Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel.DoBeginRead() in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Mqtt/ServerWebSocketChannel.cs:line 133\n","stream":"stdout","time":"2024-03-04T07:13:25.265104608Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.265 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel] - Transport aborted. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.26565424Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.266 +00:00 [INF] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\", Request id \"\"ID:00000002\"\": the application aborted the connection.\n","stream":"stdout","time":"2024-03-04T07:13:25.266200416Z"}
{"log":"\u003c7\u003e 2024-03-04 07:13:25.266 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Http.Middleware.WebSocketHandlingMiddleware] - Request ID:00000002 completed. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.269377952Z"}
{"log":"\u003c6\u003e 2024-03-04 07:13:25.266 +00:00 [INF] [Microsoft.AspNetCore.Hosting.Diagnostics] - \"Request finished HTTP/1.1 GET https://gsd7449186/$iothub/websocket - - - 101 - - 956745.0866ms\"\n","stream":"stdout","time":"2024-03-04T07:13:25.269454272Z"}

Additional Information

All the modules came back up after a couple of minutes automatically.

What I would like to get more info about:

  • What triggers the automatic shutdown of all modules because this needs to be tackled.
  • Is there a way to prevent this?

@bilalsellak can you take a look here?