getmoto / moto

A library that allows you to easily mock out tests based on AWS infrastructure.

Home Page:http://docs.getmoto.org/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Weird pagingation with DynamoDB scan

tomers opened this issue · comments

When listing items in DynamoDB, the results are unexpeced.
I created 15 items in a new DynamoDB table, and then I try to scan all the items with pagination, when page size it set to 10 items.
I call DYNAMODB_TABLE.scan(Limit=10), and I get items #1 to #10, with LastEvaluatedKey set to key of item #11, which is as expected.
I then call DYNAMODB_TABLE.scan(Limit=10, ExclusiveStartKey=<key#11>).
I expect to get items #11 to #15, with LastEvaluatedKey set to None.
However, I get items #2 to #11, with LastEvaluatedKey set to item #11.

Disclaimer: I haven't verified actual behavior of real DynamoDB database.

How to reproduce

I wrote a sample program to demonstrate the issue.

Note that all goes well when I set the ExclusiveStartKey to be on of the first values, [1..4].
However, things staring to get weird on the 5th value, specifically [5..8] , where the items retuned are as expected, but LastEvaluatedKey returned is None.
On the 9th value, I don't get any items.
On the 10th value forward, I get weird results. See example below.

#!/usr/bin/env python3
import json

import boto3
import botocore
from boto3.dynamodb.conditions import Attr

import moto

DYNAMODB_TABLE_NAME = 'table_name'
DYNAMODB = boto3.resource('dynamodb', 'eu-central-1')
DYNAMODB_TABLE = DYNAMODB.Table(DYNAMODB_TABLE_NAME)

print(f"boto3:\t\t{boto3.__version__}")
print(f"botocore:\t{botocore.__version__}")
print(f"moto:\t\t{moto.__version__}")
print("")


def get_db_id(i):
    return f'k{i}'


with moto.mock_aws():
    DYNAMODB.create_table(
        TableName=DYNAMODB_TABLE_NAME,
        KeySchema=[
            {'AttributeName': 'id', 'KeyType': 'HASH'},
        ],
        AttributeDefinitions=[
            {'AttributeName': 'id', 'AttributeType': 'S'},
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        }
    )

    # populate table
    for i in range(1, 16):
        data = f'val_{i}'
        db_id = get_db_id(i)
        res = DYNAMODB_TABLE.put_item(
            Item={'id': db_id, 'data': data},
            ConditionExpression=Attr('id').not_exists())
        assert res['ResponseMetadata']['HTTPStatusCode'] == 200

    # list items
    for i in range(1, 16):
        params = dict(Limit=10)
        params['ExclusiveStartKey'] = {'id': get_db_id(i)}
        res = DYNAMODB_TABLE.scan(**params)
        items = [item['id'] for item in res['Items']]
        print(
            f"Called DYNAMODB_TABLE.scan() with {json.dumps(params):49}, got LastEvaluatedKey {json.dumps(res.get('LastEvaluatedKey')):13}, items IDs {items}")

Observed behavior

❯ ./test_dynamodb.py 
boto3:          1.34.72
botocore:       1.34.72
moto:           5.0.4

Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k1"}} , got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k2"}} , got LastEvaluatedKey {"id": "k12"}, items IDs ['k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k3"}} , got LastEvaluatedKey {"id": "k13"}, items IDs ['k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k4"}} , got LastEvaluatedKey {"id": "k14"}, items IDs ['k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k5"}} , got LastEvaluatedKey null         , items IDs ['k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k6"}} , got LastEvaluatedKey null         , items IDs ['k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k7"}} , got LastEvaluatedKey null         , items IDs ['k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k8"}} , got LastEvaluatedKey null         , items IDs ['k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k9"}} , got LastEvaluatedKey null         , items IDs []
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k10"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k11"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k12"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k13"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k14"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k15"}}, got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']

Expected behavior

❯ ./test_dynamodb.py 
boto3:          1.34.72
botocore:       1.34.72
moto:           5.0.4

Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k1"}} , got LastEvaluatedKey {"id": "k11"}, items IDs ['k2', 'k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k2"}} , got LastEvaluatedKey {"id": "k12"}, items IDs ['k3', 'k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k3"}} , got LastEvaluatedKey {"id": "k13"}, items IDs ['k4', 'k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k4"}} , got LastEvaluatedKey {"id": "k14"}, items IDs ['k5', 'k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k5"}} , got LastEvaluatedKey null         , items IDs ['k6', 'k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k6"}} , got LastEvaluatedKey null         , items IDs ['k7', 'k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k7"}} , got LastEvaluatedKey null         , items IDs ['k8', 'k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k8"}} , got LastEvaluatedKey null         , items IDs ['k9', 'k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k9"}} , got LastEvaluatedKey null         , items IDs ['k10', 'k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k10"}}, got LastEvaluatedKey null         , items IDs ['k11', 'k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k11"}}, got LastEvaluatedKey null         , items IDs ['k12', 'k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k12"}}, got LastEvaluatedKey null         , items IDs ['k13', 'k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k13"}}, got LastEvaluatedKey null         , items IDs ['k14', 'k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k14"}}, got LastEvaluatedKey null         , items IDs ['k15']
Called DYNAMODB_TABLE.scan() with {"Limit": 10, "ExclusiveStartKey": {"id": "k15"}}, got LastEvaluatedKey null         , items IDs []

Apparently this issue is bogus, since items are not guaranteed to be returned in the same order they were created, and scanning all items should work when providing the last evaluated key as exclusive start key.