Use S3 for historical queries instead of DynamoDB

Question

Use S3 for historical queries instead of DynamoDB

bboreham opened this issue 3 years ago · comments

Scope has an optional multitenant mode, where reports are saved to S3 and indexed in DynamoDB.

Once #3783 is done, live rendering will not use the store, so we will have far less time pressure. I think we can drop the index and just use an S3 'list' API call to find objects.

However we will need to change the object path-name to include the time as a prefix.
Current paths are like s3://bucket-name/00002140a76ed46df4956c4af4004160/1554123600273225527, where the first part is a MD5 hash of the tenant ID and hour number, and the second part is the Unix timestamp in nanoseconds.

Steps to complete:

change S3 object pathname so the prefix is tenant/date/hour (or maybe finer-grained).
change querier to list reports within a prefix time-bucket using S3 rather than DynamoDB.
add switch-over date so querier uses DynamoDB index before that and S3 list after.
stop collectors writing to DynamoDB.