Update to cert-manager 1.4
maelvls opened this issue · comments
Still to be done as of 6 July 2021:
- Deprecate 1.1 and 1.3 in the Marketplace admin UI.
- Have a review on #60
- Have a review on #58
- Re-submit again and again until the review passes
- Attempt 1 (20 June 2021)
- Refusal 1: I submitted 1.3 as the "default" version instead of 1.4 (my fault)
- Attempt 2 (27 June 2021)
- Refusal 2: issue the transition from GoogleCASIssuer
v1alpha1
->v1beta1
- Attempt 3 (29 June 2021)
- Refusal 3 (29 June 2021): the
testrunner
fails with no clear indication of what is failing - Message from James Westby about our struggles with the
testrunner
(29 June 2021) - Google Engineer team investigating a bug with the backend (6 July 2021)
- Refusal 4: (7 July 2021) the
info
field still present - Attempt 5 (8 July 2021), image not changed.
- Refusal 5 (13 July 2021)
cert-manager v1.4.0 was release on 15 July 2021 and we want to update the jetstack-secure-for-cert-manager app on the Google Cloud Marketplace to be updated within a few days of each release of cert-manager.
Using the Cutting a new release instructions, we shall update the Google Cloud Marketplace app from 1.3.1 to 1.4.0.
Role
have to be added to schema.yaml
. To see what needs to be added to schema.yaml
:
# From the cert-manager repo
git diff origin/release-1.3..origin/release-1.4 deploy/charts/cert-manager
Role
to be added to both the cainjector
and controller
service accounts:
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
resourceNames: ["cert-manager-cainjector-leader-election", "cert-manager-cainjector-leader-election-core"]
verbs: ["get", "update", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
ClusterRole
needed:
rules:
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/status"]
verbs: ["update"]
- apiGroups: ["certificates.k8s.io"]
resources: ["signers"]
resourceNames: ["issuers.cert-manager.io/*", "clusterissuers.cert-manager.io/*"]
verbs: ["sign"]
- apiGroups: ["authorization.k8s.io"]
resources: ["subjectaccessreviews"]
verbs: ["create"]
Estimation: 1 hour
Note: I opened GoogleCloudPlatform/marketplace-k8s-app-tools#564 to raise the issue of not being able to create a Role
that targets the kube-system
namespace.
I submitted 1.4.0-gcm.0
for review, it should be published by tomorrow.
The issues I encountered:
-
I did not pay attention to the updates made to google-cas-issuer, although the change log is very clear. Notably, I failed properly updating from
v1alpha1
tov1beta1
. -
I struggled a lot with the now required
leases
resource, and I ended up using aClusterRole
withresourceNames
instead of aRole
, and opened an issue onmpdev
: GoogleCloudPlatform/marketplace-k8s-app-tools#564. -
Like usual, the thing that made me waste the most time was the fact that
mpdev
only shows status codes, not stdout nor stderr:>>> Running /smoke-test.yaml > 0: kubectl smoke test PASSED > 1: Create test issuer and self signed cert PASSED > 2: Try to get new cert PASSED > 3: Try to get cert secret PASSED > 4: Delete issuer and self signed cert PASSED > 5: Create a GoogleCASIssuer and a certificate FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 > 6: Delete google CAS issuer and certificate FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 >> Summary: 2 FAILED, 5 PASSED
No way to know what went wrong. It really feels like "unfinished" tooling 😥
I also opened an issue about that: GoogleCloudPlatform/marketplace-k8s-app-tools#565. -
Finally, the upgrade that Google did from
v1beta1
tov1
of the CRD inapp-crd.yaml
broke the application in 1.1 and 1.3 (#59). More specifically, we had:apiVersion: app.k8s.io/v1beta1 kind: Application spec: descriptor: ... info: []
It should have been:
apiVersion: app.k8s.io/v1beta1 kind: Application spec: descriptor: ... info: []
It seems like when Google upgraded the Application CRD from v1beta1 to v1 (this in the version of the CRD object, not the version of the Application itself). After this change, the above Application manifest could not be applied anymore. The error looked like this:
error: error validating "/data/resources.yaml": error validating data: ValidationError(Application.spec.descriptor): unknown field "info" in io.k8s.app.v1beta1.Application.spec.descriptor; if you choose to ignore these errors, turn validation off with --validate=false
My guess is that before this change, the faulty "info" field was not being validated, and the new v1 CRD version started validating it. I raised this pain point on their issue tracker: GoogleCloudPlatform/marketplace-k8s-app-tools#566
Update 29 June: (internal email)
The API version issue was resolved and noticed that the tester pod is failing at our verification service with the following error in the logs:
I0625 18:03:30.965105 1 main.go:86] >>> Running /smoke-test.yaml I0625 18:03:30.966237 1 main.go:136] > 0: kubectl smoke test I0625 18:03:31.145790 1 main.go:141] PASSED I0625 18:03:31.145824 1 main.go:136] > 1: Create test issuer and self signed cert I0625 18:03:32.482440 1 main.go:141] PASSED I0625 18:03:32.482507 1 main.go:136] > 2: Try to get new cert E0625 18:03:32.884330 1 main.go:143] FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 I0625 18:03:32.884363 1 main.go:136] > 3: Try to get cert secret E0625 18:03:33.130651 1 main.go:143] FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 I0625 18:03:33.130706 1 main.go:136] > 4: Delete issuer and self signed cert E0625 18:03:33.648541 1 main.go:143] FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 I0625 18:03:33.648579 1 main.go:136] > 5: Create a GoogleCASIssuer and a certificate E0625 18:03:34.999642 1 main.go:143] FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 I0625 18:03:34.999676 1 main.go:136] > 6: Delete google CAS issuer and certificate E0625 18:03:36.026567 1 main.go:143] FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1 E0625 18:03:36.026692 1 main.go:119] >> Summary: 5 FAILED, 2 PASSED I0625 18:03:36.026732 1 main.go:123] > 0: kubectl smoke test: PASSED I0625 18:03:36.026778 1 main.go:123] > 1: Create test issuer and self signed cert: PASSED E0625 18:03:36.026795 1 main.go:125] > 2: Try to get new cert: FAILED E0625 18:03:36.026802 1 main.go:125] > 3: Try to get cert secret: FAILED E0625 18:03:36.026807 1 main.go:125] > 4: Delete issuer and self signed cert: FAILED E0625 18:03:36.026812 1 main.go:125] > 5: Create a GoogleCASIssuer and a certificate: FAILED E0625 18:03:36.026818 1 main.go:125] > 6: Delete google CAS issuer and certificate: FAILED E0625 18:03:36.026824 1 main.go:95] >>> SUMMARY: 5 failed ERROR SMOKE_TEST Tester 'Pod/smoke-test-pod' failed.
Can you make sure your application passes mpdev verify. Instructions: https://github.com/GoogleCloudPlatform/marketplace-k8s-app-tools/blob/ master/docs/mpdev-references.md#smoke-test-an-application>.
Please ensure that the tester pod completes with a zero exit status and resubmit the draft for a review. Let me know if you have any questions. > Thank you.
Regards,
Dinesh
Note that the above-mentioned test cases are defined in smoke-test.yaml.
Update 6 July: (internal email) our release of 1.4.0-gcm.0
is now waiting on Google. On 1 July 2021, Dinesh mentioned he is in contact with the engineering team.
Apologies for the delay here. I'm following up internally with Eng to see what's going wrong here -- I'll let you know once I get an answer. Thank you.
Today (13 July), Dinesh reported that the tests are failing. Dinesh now gives us the sha256 of each failing image:
Apologies for the delay here. Your listing has 3 different versions on the Marketplace --the following two deployer images are failing due to the infofield, which is not present in the CRD:
- gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:732f49aac58fa25f73a5dd3a7a422f5e0520802b372676d8605a67d3a383480e
- gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:d5e11520513313f08da87a58d44469aa0a0c4799ee798e4418dda321195bfe22
And the latest deployer image (gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:4fb179cf2a784dddb48ea86cf9e437c921b790ae060f84e16be373cc3ef108e4) is failing with the following error message:
CustomResourceDefinition.apiextensions.k8s.io "googlecasissuers.cas-issuer.jetstack.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions
Please fix the above errors, validate the versions again using mpdev and resubmit the draft for approval. Thank you.
We now know that the testing infrastructure at Google is running mpdev verify
sequentially on all existing versions (1.1, 1.3, 1.4). Previously, I thought the tests were only run for the latest version that we submitted.
I have now re-built and re-submitted and re-created a GitHub release draft for all three images with the info
field fix:
- 1.4.0-gcm.0 (digest: c15d1253d72c, GitHub Release draft)
- 1.3.1-gcm.1 (digest: fa52d4d6522d, GitHub Release draft)
- 1.1.0-gcm.9 (digest: e19eb224ad10, GitHub Release draft)
But the fact that they run these three versions sequentially means that the v1alpha1 -> v1beta1 CRD of the Google CAS issuer breaks things as reported in the above error (see this email for more details).
I'm not sure how to go about that. I'll ask @jakexks now.
I just tried mpdev verify
and found out that it only removes namespaced resources and leaves all the cluster-scoped resources behind (as per set_ownership.py). It seems to be due to the fact that ownerReferences
can only be used with namespaced resources, not cluster-wide resources.
I still have no idea how to go around this issue 😞
1.1, 1.3 and 1.4 were accepted last night!!