workflow logs not available due to elasticsearch sharding problem
iakkus opened this issue · comments
In long-running installations, the logs of newly created workflows become unavailable due to the management service not being able to create their index at elasticsearch.
[1616660753006]
[1616660752778628] [2021-03-25 08:25:52.778] [INFO] [admin@management] [Management] [b66afa618d4311ebb8540242ac110003] [Management] [addWorkflow] Creating workflow index: mfnwf-0771dd9c37f76bda93cd25b328c6203e
[1616660753006] [1616660752783148] [2021-03-25 08:25:52.783] [INFO] [admin@management] [Management] [b66afa618d4311ebb8540242ac110003] [Management] [addWorkflow] {'error': {'root_cause': [{'type': 'index_creation_exception', 'reason': 'failed to create index [mfnwf-0771dd9c37f76bda93cd25b328c6203e]'}], 'type': 'validation_exception', 'reason': 'Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;'}, 'status': 400}`
As a result, the retrieval of logs also fails with "index_not_found" exception.
Not sure whether this is in our scope, or it is a general elasticsearch problem. I've seen workarounds, whereby the amount of shards were increased and/or old logs were deleted from the system.
We have started deleting workflow logs when the workflow is removed. Not sure whether this was a problem specific to my setup (i.e., long-running installation before the workflow log removal was happening) and/or can be replicated easily anymore.
Closing as non-issue. If somebody else runs into a similar problem, please reopen.
At this point, it is