Strategy, categorization and handling exceptions
tgrospic opened this issue · comments
Overview
It's important to recognize what kind of errors can happen in node execution. Some errors can be result of network problems in which case node should be capable to recover and continue its job, but others can be fatal and recovery cannot be possible.
We need to design a unify way to categorize different types of errors and handle them appropriately. Also it's important to correctly chain errors thrown from lower layers so that final exception contains all relevant information. Unfortunately, in current code even this basic handling of exception is not done right.
Design
Basic knowledge about exceptions
- Swallowing exceptions is strictly forbidden
- When exception is caught it must be re-thrown (as is or wrapped in another exception) or written to the log
- More info about chained exceptions:
https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html
- Exceptions should NOT be used to control flow of the program
- Exceptions in cats effects works differently from Java exceptions
- Catching all Throwable (not NonFatal) errors is dangerous (see item above)
- Transforming exceptions to
String
looses information and should only be done on the exit point not inside domain logic - Catching exceptions should be done on the latest point and not on every function call
API
Transforming of exceptions to other representation should be done only when error is handled and code will continue normal operation or error is given to external source or caller.
This is important for API implementations which is a boundary where errors must be transformed depending on the underlying protocol. We are all familiar with HTTP error codes and their meaning. gRPC basically works the same so defining custom ServiceError
type in every response is obviously wrong.
rchain/models/src/main/protobuf/DeployServiceV1.proto
Lines 72 to 235 in 6c8dbce
More information about gRPC error handling.
https://www.grpc.io/docs/guides/error/
Categorization of errors
Exceptions as the name suggest represent exceptional situation which when happen should provide enough information for human (or program) to understand the nature of the error or be able to search for more data based on the error message.
For these purposes, errors usually contain an error code which is used to group similar type of errors or to directly identify the error.
For example HTTP error 404 means that requested resource is not found.
https://www.rfc-editor.org/rfc/rfc9110.html#name-404-not-found
Or HTTP 500 to represent any kind of server error.
https://www.rfc-editor.org/rfc/rfc9110.html#name-500-internal-server-error
Error codes are also useful to give users easier way to search for errors or even generate link to documentation based on provided error codes.