kinto-signer does not invalidate cache as expected
bqbn opened this issue · comments
What happened was that they approved 2 records in staging/plugins
collection this morning per [1], at 15:49 and 15:54 UTC respectively. And from the logs, we can see that kinto-signer issued 2 cache invalidation calls to cloudfront.
Sep 12 15:49:13 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767353229344512, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 9, "Fields": {"msg": "Starting new HTTP connection (1): 169.254.169.254"}}
Sep 12 15:49:13 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767353232492288, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 9, "Fields": {"msg": "Starting new HTTP connection (1): 169.254.169.254"}}
Sep 12 15:49:13 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767353265446912, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 9, "Fields": {"msg": "Starting new HTTPS connection (1): cloudfront.amazonaws.com"}}
Sep 12 15:49:13 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767353805437440, "Type": "kinto_signer.updater", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 9, "Fields": {"msg": "Invalidated CloudFront cache at /v1/*"}}
... ...
Sep 12 15:54:46 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767686217423360, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 8, "Fields": {"msg": "Starting new HTTP connection (1): 169.254.169.254"}}
Sep 12 15:54:46 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767686220657408, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 8, "Fields": {"msg": "Starting new HTTP connection (1): 169.254.169.254"}}
Sep 12 15:54:46 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767686249065216, "Type": "botocore.vendored.requests.packages.urllib3.connectionpool", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 8, "Fields": {"msg": "Starting new HTTPS connection (1): cloudfront.amazonaws.com"}}
Sep 12 15:54:46 ip-172-31-2-43 docker-kinto[2622]: {"Timestamp": 1536767686713996288, "Type": "kinto_signer.updater", "Logger": "kinto", "Hostname": "ip-172-31-2-43.us-west-2.compute.internal", "EnvVersion": "2.0", "Severity": 6, "Pid": 8, "Fields": {"msg": "Invalidated CloudFront cache at /v1/*"}}
We checked the cloudfront logs, and saw that the 2 invalidation requests were received by cloudfront and completed successfully.
However, a while later after that, we got paged because the kinto-lambda-validate_signature
lambda failed. The log shows that the lambda failed because of the plugins
collection.
Signature verification failed on /buckets/blocklists/collections/plugins
- Signed on: 1536767686204 (2018-09-12 15:54:46 UTC)
- Records timestamp: 1524073129197 (2018-04-18 17:38:49 UTC): ValidationError
Traceback (most recent call last):
File "/var/task/aws_lambda.py", line 144, in validate_signature
raise ValidationError("\n" + "\n\n".join(error_messages))
aws_lambda.ValidationError:
Signature verification failed on /buckets/blocklists/collections/plugins
- Signed on: 1536767686204 (2018-09-12 15:54:46 UTC)
- Records timestamp: 1524073129197 (2018-04-18 17:38:49 UTC)
So we had to manually run kinto-lambda-refresh_signature
lambda to clear all caches, and then the validate_signature
lambda went back to normal.
It looks like whatever the kinto-signer tried to invalidate did not work as expected (otherwise the validate_signature
lambda would not have complained).
One difference we can see when running kinto-lambda-refresh_signature
lambda is that it sends 15 invalidation requests (may be corresponding to the following collections),
Looking at /buckets/monitor/collections/changes:
Looking at /buckets/main-workspace/collections/focus-experiments: Refresh signature: status= signed at 2018-09-12 16:42:25 UTC ( 1536770545952 )
Looking at /buckets/staging/collections/plugins: Refresh signature: status= signed at 2018-09-12 16:42:27 UTC ( 1536770547395 )
Looking at /buckets/main-workspace/collections/cfr: status= None at 2018-09-12 15:49:10 UTC ( 1536767350764 )
Looking at /buckets/staging/collections/addons: Refresh signature: status= signed at 2018-09-12 16:42:29 UTC ( 1536770549216 )
Looking at /buckets/main-workspace/collections/onboarding: status= None at 2018-09-12 15:49:10 UTC ( 1536767350796 )
Looking at /buckets/main-workspace/collections/rocket-prefs: Refresh signature: status= signed at 2018-09-12 16:42:31 UTC ( 1536770551397 )
Looking at /buckets/staging/collections/certificates: Refresh signature: status= signed at 2018-09-12 16:42:32 UTC ( 1536770552836 )
Looking at /buckets/main-workspace/collections/tippytop: Refresh signature: status= signed at 2018-09-12 16:42:34 UTC ( 1536770554729 )
Looking at /buckets/security-state-staging/collections/cert-revocations: status= None at 2018-09-12 15:49:12 UTC ( 1536767352455 )
Looking at /buckets/security-state-staging/collections/intermediates: status= None at 2018-09-12 15:49:12 UTC ( 1536767352420 )
Looking at /buckets/staging/collections/qa: Refresh signature: status= signed at 2018-09-12 16:42:37 UTC ( 1536770557033 )
Looking at /buckets/pinning-staging/collections/pins: Refresh signature: status= signed at 2018-09-12 16:42:38 UTC ( 1536770558521 )
Looking at /buckets/staging/collections/gfx: Refresh signature: status= signed at 2018-09-12 16:42:40 UTC ( 1536770560323 )
But the logs didn't tell us which collections the 2 invalidation requests the kinto-signer sent were for.
Also, this issue is somewhat related to mozilla-services/remote-settings-lambdas#46, in which we didn't know what made validate_signature lambda to fail periodically, but it may be because of this. For example, when someone signed something, the signer didn't clear the cache correctly.
One more piece of info, someone just approved a few records in main-workspace/onboarding
and it didn't cause trouble to the validate_signature
lambda.
I'm not sure if the investigation should be focused on the staging
and blocklists
buckets or not as they are the old buckets we have, whereas main-workspace
is new.
We don't invalidate cache since #1256