Route does not get updated even though verification succeeded

Question

Route does not get updated even though verification succeeded

ccremer opened this issue 4 years ago · comments

What happened:

Even though the Route is in "ready" state, it does not get updated with the certificate.

I0528 14:49:31.490954       1 route.go:496] Started syncing Route "zuerich-com-prod/www.zuerrich.com"
I0528 14:49:31.491027       1 route.go:563] Route "zuerich-com-prod/www.zuerrich.com" needs new certificate: Route is missing CertKey
I0528 14:49:33.881392       1 route.go:650] Route "zuerich-com-prod/www.zuerrich.com": Order "https://acme-v02.api.letsencrypt.org/acme/order/87136009/3538164556" is in "ready" state
I0528 14:49:33.881418       1 route.go:1063] Route "zuerich-com-prod/www.zuerrich.com": Order "https://acme-v02.api.letsencrypt.org/acme/order/87136009/3538164556" successfully validated

but sometimes also

E0528 14:49:05.281533       1 route.go:1301] zuerich-com-prod/www.zuerrich.com failed with : can't create cert order: context deadline exceeded

What you expected to happen:

Route is being updated with the certificate

How to reproduce it (as minimally and precisely as possible):

unclear. It works for other routes.

Anything else we need to know?:

The cluster is fairly large with hundreds of routes. Are race conditions possible while updating routes?
The Route YAML:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    acme.openshift.io/status: |
      provisioningStatus:
        earliestAttemptAt: "0001-01-01T00:00:00Z"
        orderStatus: ready
        orderURI: https://acme-v02.api.letsencrypt.org/acme/order/87136009/3538164556
        startedAt: "2020-05-28T14:37:29.981204855Z"
    haproxy.router.openshift.io/disable_cookies: 'true'
    haproxy.router.openshift.io/hsts_header: 'null'
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","kind":"Route","metadata":{"annotations":{"haproxy.router.openshift.io/disable_cookies":"true","kubernetes.io/tls-acme":"true"},"creationTimestamp":null,"labels":{"branch":"prod","project":"zuerich-com"},"name":"www.zuerrich.com","namespace":"zuerich-com-prod"},"spec":{"host":"www.zuerrich.com","port":{"targetPort":"http"},"tls":{"insecureEdgeTerminationPolicy":"Redirect","termination":"edge"},"to":{"kind":"Service","name":"varnish"}}}
    kubernetes.io/tls-acme: 'true'
  creationTimestamp: '2018-09-06T11:57:37Z'
  name: www.zuerrich.com
  namespace: zuerich-com-prod
spec:
  host: www.zuerrich.com
  port:
    targetPort: http
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: varnish
    weight: 100
  wildcardPolicy: None
status:
  ingress:
    - conditions:
        - lastTransitionTime: '2018-09-06T11:57:43Z'
          status: 'True'
          type: Admitted
      host: www.zuerrich.com
      routerName: router
      wildcardPolicy: None

Environment:

OpenShift/Kubernetes version (use oc/kubectl version): openshift v3.11.216, kubernetes v1.11.0+d4cacc0
controller: controller-0.9 image from quay.io

@tnozicka

Manuel Hutter · Answer 1 · Thu Jun 11 2020 22:18:53 GMT+0800 (China Standard Time)

Removing the acme.openshift.io/status annotation usually helps, however this will probably also order a new certificate

Manuel Hutter · Answer 2 · Thu Jun 11 2020 23:08:52 GMT+0800 (China Standard Time)

Actually, only removing the orderState from the status annotation works as well

Manuel Hutter · Answer 3 · Thu Jun 11 2020 23:38:47 GMT+0800 (China Standard Time)

Furthermore, there was a route where orderStatus was pending, even though verification already succeeded (and the exposer route and -pod were already gone). I removed orderStatus and earliestAttemptAt from the status annotation, and orderStatus IMMEDIATELY went to "ready".

OpenShift Bot · Answer 4 · Wed Oct 21 2020 02:57:06 GMT+0800 (China Standard Time)

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

OpenShift Bot · Answer 5 · Fri Nov 20 2020 04:48:03 GMT+0800 (China Standard Time)

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

OpenShift Bot · Answer 6 · Sun Dec 20 2020 06:41:33 GMT+0800 (China Standard Time)

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

OpenShift CI Robot · Answer 7 · Sun Dec 20 2020 06:41:35 GMT+0800 (China Standard Time)

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.