getmoto / moto

A library that allows you to easily mock out tests based on AWS infrastructure.

Home Page:http://docs.getmoto.org/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot scan all DynamoDB table entries

tomers opened this issue · comments

The following script creates 15 entries in an empty DynamoDB table, then scans through the table with pagination (page size set to 10). The script verifies the scan is complete, i.e. all the entries were fetched. The script runs in production (i.e. without moto), and it does not fail. Specifically, the asserts at the end do not trigger:

❯ ./test_dynamodb_in_aws.py 
boto3:          1.34.72
botocore:       1.34.72

Called scan() with {"Limit": 10}                                    , got LastEvaluatedKey {"id": "k14"}, items IDs ['k9', 'k6', 'k7', 'k1', 'k3', 'k13', 'k12', 'k5', 'k2', 'k14']
Called scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k14"}}, got LastEvaluatedKey null         , items IDs ['k11', 'k15', 'k10', 'k4', 'k8']
first_page_ids: {'k1', 'k12', 'k9', 'k5', 'k14', 'k6', 'k2', 'k7', 'k13', 'k3'}
second_page_ids: {'k15', 'k4', 'k10', 'k8', 'k11'}

Code:

#!/usr/bin/env python3
import json

import boto3
from boto3.dynamodb.conditions import Attr


DYNAMODB = boto3.resource('dynamodb', 'eu-central-1')
DYNAMODB_TABLE = DYNAMODB.Table('table_name')

# populate table
for i in range(1, 16):
    data = f'val_{i}'
    db_id = f'k{i}'
    res = DYNAMODB_TABLE.put_item(
        Item={'id': db_id, 'data': data},
        ConditionExpression=Attr('id').not_exists())
    assert res['ResponseMetadata']['HTTPStatusCode'] == 200

params = dict(Limit=10)
res = DYNAMODB_TABLE.scan(**params)
items = [item['id'] for item in res['Items']]
print(
    f"Called scan() with {json.dumps(params):49}, got LastEvaluatedKey {json.dumps(res.get('LastEvaluatedKey')):13}, items IDs {items}")
first_page_ids = {item['id'] for item in res['Items']}

params['ExclusiveStartKey'] = res.get('LastEvaluatedKey')
res = DYNAMODB_TABLE.scan(**params)
items = [item['id'] for item in res['Items']]
print(
    f"Called scan() with {json.dumps(params):49}, got LastEvaluatedKey {json.dumps(res.get('LastEvaluatedKey')):13}, items IDs {items}")
second_page_ids = {item['id'] for item in res['Items']}

print(f"first_page_ids: {first_page_ids}")
print(f"second_page_ids: {second_page_ids}")

assert len(first_page_ids & second_page_ids) == 0
assert len(first_page_ids | second_page_ids) == 15

Doing the same with moto, it is impossible to scan through all the entries of the table:

❯ ./test_dynamodb.py
{'Items': [{'id': 'k1', 'data': 'val_1'}, {'id': 'k2', 'data': 'val_2'}, {'id': 'k3', 'data': 'val_3'}, {'id': 'k4', 'data': 'val_4'}, {'id': 'k5', 'data': 'val_5'}, {'id': 'k6', 'data': 'val_6'}, {'id': 'k7', 'data': 'val_7'}, {'id': 'k8', 'data': 'val_8'}, {'id': 'k9', 'data': 'val_9'}, {'id': 'k10', 'data': 'val_10'}], 'Count': 10, 'ScannedCount': 10, 'LastEvaluatedKey': {'id': 'k10'}, 'ResponseMetadata': {'RequestId': '3FPC2VL7DuOEE7h6x1q0vCSHozB0TESdTEK0VAfsd7HeDdRXubv7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'amazon.com', 'date': 'Thu, 28 Mar 2024 13:40:20 GMT', 'x-amzn-requestid': '3FPC2VL7DuOEE7h6x1q0vCSHozB0TESdTEK0VAfsd7HeDdRXubv7', 'x-amz-crc32': '2600487097'}, 'RetryAttempts': 0}}
{'Items': [{'id': 'k2', 'data': 'val_2'}, {'id': 'k3', 'data': 'val_3'}, {'id': 'k4', 'data': 'val_4'}, {'id': 'k5', 'data': 'val_5'}, {'id': 'k6', 'data': 'val_6'}, {'id': 'k7', 'data': 'val_7'}, {'id': 'k8', 'data': 'val_8'}, {'id': 'k9', 'data': 'val_9'}, {'id': 'k10', 'data': 'val_10'}, {'id': 'k11', 'data': 'val_11'}], 'Count': 10, 'ScannedCount': 10, 'LastEvaluatedKey': {'id': 'k11'}, 'ResponseMetadata': {'RequestId': 'x9so4xzNNbk7YZ1rzC3CzmmTuDoTdnbWx53PXavIR0afh7RH2Luj', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'amazon.com', 'date': 'Thu, 28 Mar 2024 13:40:20 GMT', 'x-amzn-requestid': 'x9so4xzNNbk7YZ1rzC3CzmmTuDoTdnbWx53PXavIR0afh7RH2Luj', 'x-amz-crc32': '4219525820'}, 'RetryAttempts': 0}}
first_page_ids: {'k1', 'k2', 'k4', 'k10', 'k6', 'k9', 'k8', 'k3', 'k7', 'k5'}
second_page_ids: {'k2', 'k4', 'k10', 'k6', 'k9', 'k8', 'k3', 'k11', 'k7', 'k5'}
Traceback (most recent call last):
  File "/home/tshalev/dev/swg-api/./test_dynamodb.py", line 57, in <module>
    assert len(first_page_ids & second_page_ids) == 0
AssertionError

Code for moto test:

#!/usr/bin/env python3
import json

import boto3
from boto3.dynamodb.conditions import Attr

import moto

DYNAMODB_TABLE_NAME = 'table_name'
DYNAMODB = boto3.resource('dynamodb', 'eu-central-1')
DYNAMODB_TABLE = DYNAMODB.Table(DYNAMODB_TABLE_NAME)

with moto.mock_aws():
    DYNAMODB.create_table(
        TableName=DYNAMODB_TABLE_NAME,
        KeySchema=[
            {'AttributeName': 'id', 'KeyType': 'HASH'},
        ],
        AttributeDefinitions=[
            {'AttributeName': 'id', 'AttributeType': 'S'},
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        }
    )

    # populate table
    for i in range(1, 16):
        data = f'val_{i}'
        db_id = f'k{i}'
        res = DYNAMODB_TABLE.put_item(
            Item={'id': db_id, 'data': data},
            ConditionExpression=Attr('id').not_exists())
        assert res['ResponseMetadata']['HTTPStatusCode'] == 200

    params = dict(Limit=10)
    res = DYNAMODB_TABLE.scan(**params)
    print(res)
    first_page_ids = {item['id'] for item in res['Items']}

    params['ExclusiveStartKey'] = res.get('LastEvaluatedKey')
    res = DYNAMODB_TABLE.scan(**params)
    print(res)
    second_page_ids = {item['id'] for item in res['Items']}

    print(f"first_page_ids: {first_page_ids}")
    print(f"second_page_ids: {second_page_ids}")

    assert len(first_page_ids & second_page_ids) == 0
    assert len(first_page_ids | second_page_ids) == 15

Versions

boto3: 1.34.72
botocore: 1.34.72
moto: 5.0.4

This issue is a breaking change.
Works with moto 5.0.3
Fails with moto 5.0.4

This regression is probably caused by the following commit: 1940888
FYI, @bblommers

Hi @tomers , this issue should be fixed as of moto >= 5.0.5.dev12. Are you able to upgrade and verify it works for you with that release?

Thanks @bblommers, I can confirm this issue is resolved in 5.0.5.dev12