Cannot scan all DynamoDB table entries
tomers opened this issue · comments
The following script creates 15 entries in an empty DynamoDB table, then scans through the table with pagination (page size set to 10). The script verifies the scan is complete, i.e. all the entries were fetched. The script runs in production (i.e. without moto
), and it does not fail. Specifically, the asserts at the end do not trigger:
❯ ./test_dynamodb_in_aws.py
boto3: 1.34.72
botocore: 1.34.72
Called scan() with {"Limit": 10} , got LastEvaluatedKey {"id": "k14"}, items IDs ['k9', 'k6', 'k7', 'k1', 'k3', 'k13', 'k12', 'k5', 'k2', 'k14']
Called scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k14"}}, got LastEvaluatedKey null , items IDs ['k11', 'k15', 'k10', 'k4', 'k8']
first_page_ids: {'k1', 'k12', 'k9', 'k5', 'k14', 'k6', 'k2', 'k7', 'k13', 'k3'}
second_page_ids: {'k15', 'k4', 'k10', 'k8', 'k11'}
Code:
#!/usr/bin/env python3
import json
import boto3
from boto3.dynamodb.conditions import Attr
DYNAMODB = boto3.resource('dynamodb', 'eu-central-1')
DYNAMODB_TABLE = DYNAMODB.Table('table_name')
# populate table
for i in range(1, 16):
data = f'val_{i}'
db_id = f'k{i}'
res = DYNAMODB_TABLE.put_item(
Item={'id': db_id, 'data': data},
ConditionExpression=Attr('id').not_exists())
assert res['ResponseMetadata']['HTTPStatusCode'] == 200
params = dict(Limit=10)
res = DYNAMODB_TABLE.scan(**params)
items = [item['id'] for item in res['Items']]
print(
f"Called scan() with {json.dumps(params):49}, got LastEvaluatedKey {json.dumps(res.get('LastEvaluatedKey')):13}, items IDs {items}")
first_page_ids = {item['id'] for item in res['Items']}
params['ExclusiveStartKey'] = res.get('LastEvaluatedKey')
res = DYNAMODB_TABLE.scan(**params)
items = [item['id'] for item in res['Items']]
print(
f"Called scan() with {json.dumps(params):49}, got LastEvaluatedKey {json.dumps(res.get('LastEvaluatedKey')):13}, items IDs {items}")
second_page_ids = {item['id'] for item in res['Items']}
print(f"first_page_ids: {first_page_ids}")
print(f"second_page_ids: {second_page_ids}")
assert len(first_page_ids & second_page_ids) == 0
assert len(first_page_ids | second_page_ids) == 15
Doing the same with moto
, it is impossible to scan through all the entries of the table:
❯ ./test_dynamodb.py
{'Items': [{'id': 'k1', 'data': 'val_1'}, {'id': 'k2', 'data': 'val_2'}, {'id': 'k3', 'data': 'val_3'}, {'id': 'k4', 'data': 'val_4'}, {'id': 'k5', 'data': 'val_5'}, {'id': 'k6', 'data': 'val_6'}, {'id': 'k7', 'data': 'val_7'}, {'id': 'k8', 'data': 'val_8'}, {'id': 'k9', 'data': 'val_9'}, {'id': 'k10', 'data': 'val_10'}], 'Count': 10, 'ScannedCount': 10, 'LastEvaluatedKey': {'id': 'k10'}, 'ResponseMetadata': {'RequestId': '3FPC2VL7DuOEE7h6x1q0vCSHozB0TESdTEK0VAfsd7HeDdRXubv7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'amazon.com', 'date': 'Thu, 28 Mar 2024 13:40:20 GMT', 'x-amzn-requestid': '3FPC2VL7DuOEE7h6x1q0vCSHozB0TESdTEK0VAfsd7HeDdRXubv7', 'x-amz-crc32': '2600487097'}, 'RetryAttempts': 0}}
{'Items': [{'id': 'k2', 'data': 'val_2'}, {'id': 'k3', 'data': 'val_3'}, {'id': 'k4', 'data': 'val_4'}, {'id': 'k5', 'data': 'val_5'}, {'id': 'k6', 'data': 'val_6'}, {'id': 'k7', 'data': 'val_7'}, {'id': 'k8', 'data': 'val_8'}, {'id': 'k9', 'data': 'val_9'}, {'id': 'k10', 'data': 'val_10'}, {'id': 'k11', 'data': 'val_11'}], 'Count': 10, 'ScannedCount': 10, 'LastEvaluatedKey': {'id': 'k11'}, 'ResponseMetadata': {'RequestId': 'x9so4xzNNbk7YZ1rzC3CzmmTuDoTdnbWx53PXavIR0afh7RH2Luj', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'amazon.com', 'date': 'Thu, 28 Mar 2024 13:40:20 GMT', 'x-amzn-requestid': 'x9so4xzNNbk7YZ1rzC3CzmmTuDoTdnbWx53PXavIR0afh7RH2Luj', 'x-amz-crc32': '4219525820'}, 'RetryAttempts': 0}}
first_page_ids: {'k1', 'k2', 'k4', 'k10', 'k6', 'k9', 'k8', 'k3', 'k7', 'k5'}
second_page_ids: {'k2', 'k4', 'k10', 'k6', 'k9', 'k8', 'k3', 'k11', 'k7', 'k5'}
Traceback (most recent call last):
File "/home/tshalev/dev/swg-api/./test_dynamodb.py", line 57, in <module>
assert len(first_page_ids & second_page_ids) == 0
AssertionError
Code for moto
test:
#!/usr/bin/env python3
import json
import boto3
from boto3.dynamodb.conditions import Attr
import moto
DYNAMODB_TABLE_NAME = 'table_name'
DYNAMODB = boto3.resource('dynamodb', 'eu-central-1')
DYNAMODB_TABLE = DYNAMODB.Table(DYNAMODB_TABLE_NAME)
with moto.mock_aws():
DYNAMODB.create_table(
TableName=DYNAMODB_TABLE_NAME,
KeySchema=[
{'AttributeName': 'id', 'KeyType': 'HASH'},
],
AttributeDefinitions=[
{'AttributeName': 'id', 'AttributeType': 'S'},
],
ProvisionedThroughput={
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 10
}
)
# populate table
for i in range(1, 16):
data = f'val_{i}'
db_id = f'k{i}'
res = DYNAMODB_TABLE.put_item(
Item={'id': db_id, 'data': data},
ConditionExpression=Attr('id').not_exists())
assert res['ResponseMetadata']['HTTPStatusCode'] == 200
params = dict(Limit=10)
res = DYNAMODB_TABLE.scan(**params)
print(res)
first_page_ids = {item['id'] for item in res['Items']}
params['ExclusiveStartKey'] = res.get('LastEvaluatedKey')
res = DYNAMODB_TABLE.scan(**params)
print(res)
second_page_ids = {item['id'] for item in res['Items']}
print(f"first_page_ids: {first_page_ids}")
print(f"second_page_ids: {second_page_ids}")
assert len(first_page_ids & second_page_ids) == 0
assert len(first_page_ids | second_page_ids) == 15
Versions
boto3: 1.34.72
botocore: 1.34.72
moto: 5.0.4
This issue is a breaking change.
Works with moto
5.0.3
Fails with moto
5.0.4
This regression is probably caused by the following commit: 1940888
FYI, @bblommers
Hi @tomers , this issue should be fixed as of moto >= 5.0.5.dev12. Are you able to upgrade and verify it works for you with that release?
Thanks @bblommers, I can confirm this issue is resolved in 5.0.5.dev12