knix-microfunctions / knix

Serverless computing platform with process-based lightweight function execution and container-based application isolation. Works in Knative and bare metal/VM environments.

Home Page:https://knix.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

workflow logs not available due to elasticsearch sharding problem

iakkus opened this issue · comments

In long-running installations, the logs of newly created workflows become unavailable due to the management service not being able to create their index at elasticsearch.

[1616660753006] [1616660752778628] [2021-03-25 08:25:52.778] [INFO] [admin@management] [Management] [b66afa618d4311ebb8540242ac110003] [Management] [addWorkflow] Creating workflow index: mfnwf-0771dd9c37f76bda93cd25b328c6203e

[1616660753006] [1616660752783148] [2021-03-25 08:25:52.783] [INFO] [admin@management] [Management] [b66afa618d4311ebb8540242ac110003] [Management] [addWorkflow] {'error': {'root_cause': [{'type': 'index_creation_exception', 'reason': 'failed to create index [mfnwf-0771dd9c37f76bda93cd25b328c6203e]'}], 'type': 'validation_exception', 'reason': 'Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;'}, 'status': 400}`

As a result, the retrieval of logs also fails with "index_not_found" exception.

Not sure whether this is in our scope, or it is a general elasticsearch problem. I've seen workarounds, whereby the amount of shards were increased and/or old logs were deleted from the system.

We have started deleting workflow logs when the workflow is removed. Not sure whether this was a problem specific to my setup (i.e., long-running installation before the workflow log removal was happening) and/or can be replicated easily anymore.

Closing as non-issue. If somebody else runs into a similar problem, please reopen.

At this point, it is