Unable to easily mock Athena result CSV
dmarra opened this issue · comments
We are able to mock Athena query results via:
resp = requests.post(
"http://motoapi.amazonaws.com/moto-api/static/athena/query-results",
json=query_results,
)
assert resp.status_code == 201
But there is no way to mock the output of get_execution_results
. this poses a few problems:
- unable to have any control over request status (in cases where you want to test that retries are happening for instance)
- unable to have any control over the output location
I am more interested in the latter. This would be a lot more useful if we could somehow assign results via a CSV, and have the query-results
mocked from that. But in the simplest case, being able to control that datapoint to point to a file we have put to a mock bucket
would go a long way in the case where the code under test needs to access the file directly.
Here is an example of how I had to hack this into my tests:
from moto.athena import athena_backends
from moto.athena.models import Execution
from moto.core import DEFAULT_ACCOUNT_ID
@pytest.fixture
@mock_aws
def mock_athena_results():
# place the mocked athena results in "s3"
s3 = boto3.client("s3", region_name=os.getenv("AWS_DEFAULT_REGION"))
bucket_name = 'test-bucket'
key = f"query_results/output.csv"
with open('some/csv/file.csv', "r") as f:
contents = f.read()
s3.put_object(Bucket=bucket_name, Key=key, Body=contents)
# Hack moto to mock the execution results to have the correct path to our csv results
exec = Execution(
query="SELECT * FROM notused",
config={
"OutputLocation": f"s3://test-bucket/query_results/output.csv"
},
context={
'Database': 'default'
},
workgroup={},
execution_parameters={}
)
# moto tries to mock the file location to _look_ correct, but we need it to be an actual result
exec.config['OutputLocation'] = f"s3://test-bucket/query_results/output.csv"
# This is a hack to get moto to return the correct execution object by having a factory that
# pretends to be a list of executions
class MockExecutionFactory:
def __get__(self, instance, owner):
return [
exec
]
def __set__(self, instance, value):
pass
def __setitem__(self, key, value):
pass
def __getitem__(self, key):
return exec
athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"].executions = MockExecutionFactory()
return
The above works, but its pretty janky. I suppose this is more of a feature request than anything. Or even better yet, having a way to control execution id (not with seeding, thats still too unpredictable and fragile) such that it can be explicitly set.
Hi @dmarra, thanks for raising this.
My assumption is that the application takes whatever OutputLocation
is given, downloads the result from S3 and does whatever it needs to do with the result.
Would it work if Moto would actually write the configured query results to S3? Then the actual OutputLocation
doesn't matter, and the only test setup is to configure the expected results via the static/athena/query-results
endpoint.
That could work! The only issue with that is that the Athena output JSON is.... perhaps a bit overengineered. It's a pain to mock many rows of data in that format. But that being said, if this is the easiest way to get the enhancement in, then I think its way better than nothing. Makes sense too, since it works with the current design.
To clarify:
- It would have to respect the bucket given as output (but the file name could be the auto-generated execution ID)
- The output format is CSV. I don't think Athena has a way to change that, so it should be fine to just always have the output in CSV format in the bucket
Hi @dmarra, this is now part of moto >= 5.0.12.dev3
.
It uses the bucket provided by the user, and the file name is the execution ID, so it follows the same format as AWS.
Please let us know if you run into any issues!