EventStreamError is a 400 and never retries, but some EventStreamErrors are retriable
SamStephens opened this issue · comments
Describe the bug
botocore/botocore/eventstream.py
Lines 354 to 362 in bf24737
means that any EventStreamError is treated as a 400.
Expected Behavior
Some EventStreamErrors should be retried. For instance, throttlingException.
Current Behavior
No EventStreamErrors are retried.
Reproduction Steps
N/A
Possible Solution
What would be ideal is if every exception returnable in an event stream mapped directly back to an exception the client already supports. In that case, we could look up the exception from the client and get the correct status code. Or possibly even instantiate the exception as an instance of the exception from the client wrapped in an EventStreamError. For example, a throttling error from the bedrock runtime might look like:
botocore.exceptions.EventStreamError(base_exception=bedrock_client.exceptions.ThrottlingException())
Even if some clients have event streams with exceptions that are not in the list of exceptions the client supports, we could still use this idea with a fallback behaviour.
If this idea isn't suitable, a simpler but less flexible idea is to have a static list of exception names that should be treated as 500s.
Additional Information/Context
EDIT: boto/boto3#4031 was resolved in favour of this issue, so documentation needs to be addressed as part of this.
The documentation for methods that return event streams, at least the ones I've looked at, include errors as part of the shape of the structure that can be returned. For example from https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint_with_response_stream.html
'Body': EventStream({
'PayloadPart': {
'Bytes': b'bytes'
},
'ModelStreamError': {
'Message': 'string',
'ErrorCode': 'string'
},
'InternalStreamFailure': {
'Message': 'string'
}
}),
However boto3 is smart enough to throw errors as a EventStreamError, via
botocore/botocore/eventstream.py
Lines 354 to 362 in bf24737
botocore/botocore/eventstream.py
Lines 613 to 619 in bf24737
This is a couple of examples, not an exhaustive list:
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/invoke_model_with_response_stream.html
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint_with_response_stream.html
SDK version used
Applies to latest
Environment details (OS name and version, etc.)
Ubuntu (Windows Subsystem for Linux)
Thanks for reporting this issue, we can continue tracking it for further review from the team. In the meantime here is some guidance to help with Bedrock throttling exceptions: https://repost.aws/questions/QU11DRlMZfRDy0ngHxpO1VCw/throttlingexceptions-while-using-on-demand-bedrock-runtime-for-invoking-claude-v2-1#ANkwWynhQgRHi003I_nC3NJQ
To expand on that post a bit more:
- For info on Bedrock quotas you can refer to this documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html
- For provisioned throughput info you can refer to this documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html.
- For info on Boto3 retries you can refer to this documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html
As boto/boto3#4031 was resolved in favour of this issue, I've updated the issue description to ensure that documentation issue is addressed as part of resolving this issue.