Unable to easily mock Athena result CSV

Question

Unable to easily mock Athena result CSV

dmarra opened this issue 4 months ago · comments

We are able to mock Athena query results via:

   resp = requests.post(
        "http://motoapi.amazonaws.com/moto-api/static/athena/query-results",
        json=query_results,
    )
    assert resp.status_code == 201

But there is no way to mock the output of get_execution_results. this poses a few problems:

unable to have any control over request status (in cases where you want to test that retries are happening for instance)
unable to have any control over the output location

I am more interested in the latter. This would be a lot more useful if we could somehow assign results via a CSV, and have the query-results mocked from that. But in the simplest case, being able to control that datapoint to point to a file we have put to a mock bucket
would go a long way in the case where the code under test needs to access the file directly.

Here is an example of how I had to hack this into my tests:

from moto.athena import athena_backends
from moto.athena.models import Execution
from moto.core import DEFAULT_ACCOUNT_ID

@pytest.fixture
@mock_aws
def mock_athena_results():
    # place the mocked athena results in "s3"
    s3 = boto3.client("s3", region_name=os.getenv("AWS_DEFAULT_REGION"))
    bucket_name = 'test-bucket'
    key = f"query_results/output.csv"
  
    with open('some/csv/file.csv', "r") as f:
        contents = f.read()
        s3.put_object(Bucket=bucket_name, Key=key, Body=contents)

    # Hack moto to mock the execution results to have the correct path to our csv results
    exec = Execution(
        query="SELECT * FROM notused",
        config={
            "OutputLocation": f"s3://test-bucket/query_results/output.csv"
        },
        context={
            'Database': 'default'
        },
        workgroup={},
        execution_parameters={}
    )
    # moto tries to mock the file location to _look_ correct, but we need it to be an actual result
    exec.config['OutputLocation'] = f"s3://test-bucket/query_results/output.csv"

    # This is a hack to get moto to return the correct execution object by having a factory that
    # pretends to be a list of executions
    class MockExecutionFactory:
        def __get__(self, instance, owner):
            return [
                exec
            ]

        def __set__(self, instance, value):
            pass

        def __setitem__(self, key, value):
            pass

        def __getitem__(self, key):
            return exec

    athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"].executions = MockExecutionFactory()
    return

The above works, but its pretty janky. I suppose this is more of a feature request than anything. Or even better yet, having a way to control execution id (not with seeding, thats still too unpredictable and fragile) such that it can be explicitly set.

Bert Blommers · Answer 1 · Sun May 19 2024 19:20:49 GMT+0800 (China Standard Time)

Hi @dmarra, thanks for raising this.

My assumption is that the application takes whatever OutputLocation is given, downloads the result from S3 and does whatever it needs to do with the result.

Would it work if Moto would actually write the configured query results to S3? Then the actual OutputLocation doesn't matter, and the only test setup is to configure the expected results via the static/athena/query-results endpoint.

Dan Marra · Answer 2 · Mon May 20 2024 19:54:40 GMT+0800 (China Standard Time)

That could work! The only issue with that is that the Athena output JSON is.... perhaps a bit overengineered. It's a pain to mock many rows of data in that format. But that being said, if this is the easiest way to get the enhancement in, then I think its way better than nothing. Makes sense too, since it works with the current design.

Dan Marra · Answer 3 · Mon May 20 2024 19:57:09 GMT+0800 (China Standard Time)

To clarify:

It would have to respect the bucket given as output (but the file name could be the auto-generated execution ID)
The output format is CSV. I don't think Athena has a way to change that, so it should be fine to just always have the output in CSV format in the bucket

Bert Blommers · Answer 4 · Tue Jul 09 2024 16:28:26 GMT+0800 (China Standard Time)

Hi @dmarra, this is now part of moto >= 5.0.12.dev3.

It uses the bucket provided by the user, and the file name is the execution ID, so it follows the same format as AWS.
Please let us know if you run into any issues!