namely / chief-of-state

gRPC clustered event sourcing tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Migration failures are ignored

dalazx opened this issue · comments

Describe the bug
Migration failures are ignored and in spite of that the service continues to be up. K8S reports that the corresponding pods are Ready but it is not the case.

{"timestamp":"2021-09-09T10:40:00.476Z","@version":"1","message":"migration failed","logger_name":"com.namely.chiefofstate.ServiceMigrationRunner$","thread_name":"ChiefOfStateSystem-akka.actor.default-dispatcher-15","level":"ERROR","level_value":40000,"stack_trace":"org.postgresql.util.PSQLException: ERROR: relation \"public.journal\" does not exist\n  Position: 91\n\tat org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)\n\tat org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)\n\tat org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)\n\tat org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481)\n\tat org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401)\n\tat org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)\n\tat org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153)\n\tat com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)\n\tat com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)\n\tat slick.jdbc.StatementInvoker.results(StatementInvoker.scala:39)\n\tat slick.jdbc.StatementInvoker.iteratorTo(StatementInvoker.scala:22)\n\tat slick.jdbc.StreamingInvokerAction.emitStream(StreamingInvokerAction.scala:28)\n\tat slick.jdbc.StreamingInvokerAction.emitStream$(StreamingInvokerAction.scala:26)\n\tat slick.jdbc.JdbcActionComponent$QueryActionExtensionMethodsImpl$$anon$2.emitStream(JdbcActionComponent.scala:214)\n\tat slick.jdbc.JdbcActionComponent$QueryActionExtensionMethodsImpl$$anon$2.emitStream(JdbcActionComponent.scala:214)\n\tat slick.basic.BasicBackend$DatabaseDef$$anon$4.run(BasicBackend.scala:342)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n","service":"payments-cos","powered_by":"chiefofstate"}

the migration actor was terminated due to the failure above

{"timestamp":"2021-09-09T10:40:00.479Z","@version":"1","message":"Singleton actor [akka://ChiefOfStateSystem/system/singletonManagerCosServiceMigrationRunner/CosServiceMigrationRunner] was terminated","logger_name":"akka.cluster.singleton.ClusterSingletonManager","thread_name":"ChiefOfStateSystem-akka.actor.default-dispatcher-7","level":"INFO","level_value":20000,"akkaAddress":"akka://ChiefOfStateSystem@10.3.140.126:25520","sourceThread":"ChiefOfStateSystem-akka.actor.internal-dispatcher-5","akkaSource":"akka://ChiefOfStateSystem@10.3.140.126:25520/system/singletonManagerCosServiceMigrationRunner","sourceActorSystem":"ChiefOfStateSystem","akkaTimestamp":"10:40:00.479UTC","tags":["akkaClusterSingletonTerminated"],"service":"payments-cos","powered_by":"chiefofstate"}
kubectl describe po ...
...
Status:       Running
...
    Ready:          True
    Restart Count:  0
...
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 

To Reproduce
Steps to reproduce the behavior:

  1. Deploy CoS from scratch with COS_MIGRATIONS_INITIAL_VERSION set to 0 to cause the failure;
  2. Wait until all the corresponding pods are running;
  3. Check the pod states.

Expected behavior
The service should fail fast with a non-zero exit code.

@dalazx Thanks for reporting it. I will take a look.

@dalazx issue resolved. Next release will have the fix