boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EventStreamError is a 400 and never retries, but some EventStreamErrors are retriable

SamStephens opened this issue · comments

Describe the bug

def to_response_dict(self, status_code=200):
message_type = self.headers.get(':message-type')
if message_type == 'error' or message_type == 'exception':
status_code = 400
return {
'status_code': status_code,
'headers': self.headers,
'body': self.payload,
}

means that any EventStreamError is treated as a 400.

Expected Behavior

Some EventStreamErrors should be retried. For instance, throttlingException.

Current Behavior

No EventStreamErrors are retried.

Reproduction Steps

N/A

Possible Solution

What would be ideal is if every exception returnable in an event stream mapped directly back to an exception the client already supports. In that case, we could look up the exception from the client and get the correct status code. Or possibly even instantiate the exception as an instance of the exception from the client wrapped in an EventStreamError. For example, a throttling error from the bedrock runtime might look like:

botocore.exceptions.EventStreamError(base_exception=bedrock_client.exceptions.ThrottlingException())

Even if some clients have event streams with exceptions that are not in the list of exceptions the client supports, we could still use this idea with a fallback behaviour.

If this idea isn't suitable, a simpler but less flexible idea is to have a static list of exception names that should be treated as 500s.

Additional Information/Context

EDIT: boto/boto3#4031 was resolved in favour of this issue, so documentation needs to be addressed as part of this.

The documentation for methods that return event streams, at least the ones I've looked at, include errors as part of the shape of the structure that can be returned. For example from https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint_with_response_stream.html

    'Body': EventStream({
        'PayloadPart': {
            'Bytes': b'bytes'
        },
        'ModelStreamError': {
            'Message': 'string',
            'ErrorCode': 'string'
        },
        'InternalStreamFailure': {
            'Message': 'string'
        }
    }),

However boto3 is smart enough to throw errors as a EventStreamError, via

def to_response_dict(self, status_code=200):
message_type = self.headers.get(':message-type')
if message_type == 'error' or message_type == 'exception':
status_code = 400
return {
'status_code': status_code,
'headers': self.headers,
'body': self.payload,
}
and
def _parse_event(self, event):
response_dict = event.to_response_dict()
parsed_response = self._parser.parse(response_dict, self._output_shape)
if response_dict['status_code'] == 200:
return parsed_response
else:
raise EventStreamError(parsed_response, self._operation_name)
.

This is a couple of examples, not an exhaustive list:

SDK version used

Applies to latest

Environment details (OS name and version, etc.)

Ubuntu (Windows Subsystem for Linux)

Thanks for reporting this issue, we can continue tracking it for further review from the team. In the meantime here is some guidance to help with Bedrock throttling exceptions: https://repost.aws/questions/QU11DRlMZfRDy0ngHxpO1VCw/throttlingexceptions-while-using-on-demand-bedrock-runtime-for-invoking-claude-v2-1#ANkwWynhQgRHi003I_nC3NJQ

To expand on that post a bit more:

As boto/boto3#4031 was resolved in favour of this issue, I've updated the issue description to ensure that documentation issue is addressed as part of resolving this issue.