/api/exportVaccinateTheStates failing in production
simonw opened this issue · comments
Refs #705. The endpoint is showing "failed" messages when called using Cloud Scheduler.
The staging job - running the same export code, but against a smaller set of data - works OK.
Nothing in Sentry. It looks like it's been failing for quite a while - definitely since I deployed the latest code.
I tried running it manually like so:
~ % curl -XPOST 'https://vial.calltheshots.us/api/exportVaccinateTheStates' --header "Authorization: Bearer 27:b5d06ef7..." -d ''
{"ok": 1}
That worked! https://api.vaccinatethestates.com/ shows a whole bunch of files with a last modification date around 2021-07-08T23:23:53.299Z
- which is 9 minutes ago in UTC.
Looking at the logs from the most recent two attempts:
I'm suspicious that it looks like exactly 3 minutes occurred between that initial log line (presumably corresponding to the start of the hit) and the log line with the error. That suggests to me that there's a Cloud Scheduler timeout of some sort here.
https://stackoverflow.com/a/66298820/6083 suggests you can set a job "deadline" from the CLI like this:
gcloud beta scheduler jobs update http <job> --attempt-deadline=1800s --project <project>
Looks like that option isn't available in the web console interface at https://console.cloud.google.com/cloudscheduler/jobs/edit/us-west2/vaccinatethestates-api-export-production?project=django-vaccinateca
I have CLI access on my laptop:
~ % gcloud beta scheduler jobs list --project django-vaccinateca
ID LOCATION SCHEDULE (TZ) TARGET_TYPE STATE
api-export-production us-west2 every 1 minutes (America/Los_Angeles) HTTP ENABLED
api-export-staging us-west2 every 1 minutes (America/Los_Angeles) HTTP ENABLED
mapbox-export us-west2 0 2,9,10,11,12,13,14,15,16,17,18,21 * * * (America/Los_Angeles) HTTP ENABLED
resolve-missing-counties-production us-west2 */10 * * * * (America/Los_Angeles) HTTP ENABLED
resolve-missing-counties-staging us-west2 */10 * * * * (America/Los_Angeles) HTTP ENABLED
vaccinatethestates-api-export-production us-west2 */10 * * * * (America/Los_Angeles) HTTP ENABLED
vaccinatethestates-api-export-staging us-west2 */10 * * * * (America/Los_Angeles) HTTP ENABLED
Confirmed: the current deadline is 180s:
~ % gcloud beta scheduler jobs describe vaccinatethestates-api-export-production --project django-vaccinateca
attemptDeadline: 180s
description: Hit /api/exportVaccinateTheStates to export to api.vaccinatethestates.com
bucket
httpTarget:
headers:
Authorization: Bearer 27:b5d06ef7bfa1650267ed9750228c5e93
User-Agent: Google-Cloud-Scheduler
httpMethod: POST
uri: https://vial.calltheshots.us/api/exportVaccinateTheStates
lastAttemptTime: '2021-07-08T23:40:00.992830Z'
name: projects/django-vaccinateca/locations/us-west2/jobs/vaccinatethestates-api-export-production
retryConfig:
maxBackoffDuration: 3600s
maxDoublings: 5
maxRetryDuration: 0s
minBackoffDuration: 5s
schedule: '*/10 * * * *'
scheduleTime: '2021-07-08T23:50:00.061563Z'
state: ENABLED
status:
code: 2
timeZone: America/Los_Angeles
userUpdateTime: '2021-06-10T23:34:48Z'
I'm going to set the deadline to 9 minutes (since the cron runs every 10 minutes) - which is 540s.
gcloud beta scheduler jobs update http vaccinatethestates-api-export-production --attempt-deadline=540s --project django-vaccinateca
Oops, just accidentally shared an API key in the above comment ^ - I'll cancel that and issue a new one now.
That fixed it - the scheduled task ran successfully.
I'm going to bump up the time limit on the staging job too:
gcloud beta scheduler jobs update http vaccinatethestates-api-export-staging --attempt-deadline=540s --project django-vaccinateca