ory / keto

The most scalable and customizable permission server on the market. Fix your slow or broken permission system with Google's proven "Zanzibar" approach. Supports ACL, RBAC, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.

Home Page:https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=keto

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Helm Release Job NotReady Status

sp71 opened this issue · comments

Preflight checklist

Ory Network Project

No response

Describe the bug

When bringing up keto in terraform using the helm_release resource with autoMigration enabled, the job's pod is always set to NotReady despite the logs from the jobs pod indicating that the migration was applied correctly. I verified the database had all the changes committed to it correctly. Any ideas why the job's pod is always set to NotReady? I am using the cloudSQL proxy as the side car container.

Reproducing the bug

Steps to reproduce the behavior:

  1. Apply terraform
  2. See keto's job pod status set to NotReady

Relevant log output

Jobs Pod Logs

time=2023-09-10T12:12:40Z level=error msg=Unable to ping the database connection, retrying. audience=application error=map[message:failed to connect to `host=127.0.0.1 user=postgres database=`: dial error (dial tcp 127.0.0.1:5432: connect: connection refused)] service_name=Ory Keto service_version=v0.11.1-alpha.0
[POP] 2023/09/10 12:12:47 warn - One or more of connection details are specified in database.yml. Override them with values in URL.
time=2023-09-10T12:12:47Z level=info msg=No tracer configured - skipping tracing setup audience=application service_name=Ory Keto service_version=v0.11.1-alpha.0
Current status:
Version			Name					Status
20150100000001000000	networks				Pending
20201110175414000000	relationtuple				Pending
20201110175414000001	relationtuple				Pending
20210623162417000000	relationtuple				Pending
20210623162417000001	relationtuple				Pending
20210623162417000002	relationtuple				Pending
20210623162417000003	relationtuple				Pending
20210914134624000000	legacy-cleanup				Pending
20220217152313000000	nid_fk					Pending
20220512151000000000	indices					Pending
20220513200300000000	create-intermediary-uuid-table		Pending
20220513200400000000	create-uuid-mapping-table		Pending
20220513200400000001	uuid-mapping-remove-check		Pending
20220513200500000000	migrate-strings-to-uuids		Pending
20220513200600000000	drop-old-non-uuid-table			Pending
20220513200600000001	drop-old-non-uuid-table			Pending
20230228091200000000	add-on-delete-cascade-to-relationship	Pending
Applying migrations...
Successfully applied all migrations:
Version			Name					Status
20150100000001000000	networks				Applied
20201110175414000000	relationtuple				Applied
20201110175414000001	relationtuple				Applied
20210623162417000000	relationtuple				Applied
20210623162417000001	relationtuple				Applied
20210623162417000002	relationtuple				Applied
20210623162417000003	relationtuple				Applied
20210914134624000000	legacy-cleanup				Applied
20220217152313000000	nid_fk					Applied
20220512151000000000	indices					Applied
20220513200300000000	create-intermediary-uuid-table		Applied
20220513200400000000	create-uuid-mapping-table		Applied
20220513200400000001	uuid-mapping-remove-check		Applied
20220513200500000000	migrate-strings-to-uuids		Applied
20220513200600000000	drop-old-non-uuid-table			Applied
20220513200600000001	drop-old-non-uuid-table			Applied
20230228091200000000	add-on-delete-cascade-to-relationship	Applied


### Relevant configuration

```yml
resource "helm_release" "keto" {
  name       = "ory"
  repository = "https://k8s.ory.sh/helm/charts"
  chart      = "keto"

  values = [
    <<EOT
    serviceAccount:
      create: false
      name: ${module.service_account.value.id}
    job:
      serviceAccount:
        create: false
        name: ${module.service_account.value.id}
      extraContainers: |
        - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.6.1
          imagePullPolicy: Always
          args:
          - "--structured-logs"
          - "--health-check"
          - "--http-address=0.0.0.0"
          - "--port=${local.sql_port}"
          - "--private-ip"
          - ${var.project_id}:${var.default_region}:${module.sql_db.name}
          securityContext:
            runAsNonRoot: true
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
          livenessProbe:
            httpGet:
              path: /liveness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 2
          readinessProbe:
            httpGet:
              path: /readiness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 2
          startupProbe:
            httpGet:
              path: /startup
              port: 9090
            periodSeconds: 1
            timeoutSeconds: 5
            failureThreshold: 20
          resources:
            requests:
              memory: 128Mi
              cpu: 50m
            limits:
              memory: 512Mi
              cpu: 250m
    keto:
      automigration:
        enabled: true
      config:
        dsn: postgres://${local.db_username}:${random_password.password.result}@127.0.0.1:${local.sql_port}
    deployment:
      extraContainers: |
        - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.6.1
          imagePullPolicy: Always
          args:
          - "--structured-logs"
          - "--health-check"
          - "--http-address=0.0.0.0"
          - "--port=${local.sql_port}"
          - "--private-ip"
          - ${var.project_id}:${var.default_region}:${module.sql_db.name}
          securityContext:
            runAsNonRoot: true
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
          livenessProbe:
            httpGet:
              path: /liveness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 2
          readinessProbe:
            httpGet:
              path: /readiness
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 2
          startupProbe:
            httpGet:
              path: /startup
              port: 9090
            periodSeconds: 1
            timeoutSeconds: 5
            failureThreshold: 20
          resources:
            requests:
              memory: 128Mi
              cpu: 50m
            limits:
              memory: 512Mi
              cpu: 250m
    EOT
  ]
}

Version

v0.11.1

On which operating system are you observing this issue?

None

In which environment are you deploying?

Kubernetes with Helm

Additional Context

  • CloudSQL PostgreSQL database
  • GCP

Closing due to inactivity