Improve error handling
nrutledge opened this issue · comments
We have not put a lot of thought into error handling at this point. We need to ensure the following:
- If any step fails (e.g., upload to R2), there is a retry mechanism in place.
- If things continue to fail after a certain number of retries, we are alerted of the failure.
- Someone running the snapshot service locally can also receive alerts on failures.
This issue was brought up during the 2024/03/04 sync meeting.