mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Home Page:http://mesos.github.io/chronos/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Index chronos_job_name_idx_1 is a duplicate of existing index chronos_job_name_idx

eLvErDe opened this issue · comments

Hello,

Version 3.0.2 fails to output metrics to cassandra properly:

Aug 21 09:46:02 luigi chronos[28335]: DEBUG com.datastax.driver.core.Connection - Connection[mario.domain/10.99.50.1:9042-2, inFlight=1, closed=false] Setting keyspace metrics
Aug 21 09:46:02 luigi chronos[28335]: DEBUG com.datastax.driver.core.Connection - Connection[mario.domain/10.99.50.1:9042-2, inFlight=1, closed=false] Keyspace set to metrics
Aug 21 09:46:02 luigi chronos[28335]: DEBUG com.datastax.driver.core.Connection - Connection[zelda.domain/10.99.50.3:9042-1, inFlight=1, closed=false] Setting keyspace metrics
Aug 21 09:46:02 luigi chronos[28335]: DEBUG com.datastax.driver.core.Connection - Connection[zelda.domain/10.99.50.3:9042-1, inFlight=1, closed=false] Keyspace set to metrics
Aug 21 09:46:02 luigi chronos[28335]: WARN  o.a.m.c.s.jobs.stats.JobStats - Caught exception when creating Cassandra JobStats session
Aug 21 09:46:02 luigi chronos[28335]: com.datastax.driver.core.exceptions.InvalidQueryException: Index chronos_job_name_idx_1 is a duplicate of existing index chronos_job_name_idx
Aug 21 09:46:02 luigi chronos[28335]: #011at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
Aug 21 09:46:02 luigi chronos[28335]: #011at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
Aug 21 09:46:02 luigi chronos[28335]: #011at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
Aug 21 09:46:02 luigi chronos[28335]: #011at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.stats.JobStats.getSession(JobStats.scala:194)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.stats.JobStats.insertToStatTable(JobStats.scala:522)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.stats.JobStats.org$apache$mesos$chronos$scheduler$jobs$stats$JobStats$$jobFinished(JobStats.scala:465)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.stats.JobStats$$anonfun$asObserver$1.applyOrElse(JobStats.scala:419)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.stats.JobStats$$anonfun$asObserver$1.applyOrElse(JobStats.scala:414)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anon$1.apply(JobsObserver.scala:44)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anon$1.apply(JobsObserver.scala:41)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anon$1.applyOrElse(JobsObserver.scala:41)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anonfun$composite$1$$anonfun$applyOrElse$1.apply(JobsObserver.scala:35)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anonfun$composite$1$$anonfun$applyOrElse$1.apply(JobsObserver.scala:35)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.collection.immutable.List.foreach(List.scala:381)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anonfun$composite$1.applyOrElse(JobsObserver.scala:35)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobsObserver$$anonfun$composite$1.applyOrElse(JobsObserver.scala:34)
Aug 21 09:46:02 luigi chronos[28335]: #011at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.handleFinishedTask(JobScheduler.scala:237)
Aug 21 09:46:02 luigi chronos[28335]: #011at org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework.statusUpdate(MesosJobFramework.scala:213)
Aug 21 09:46:02 luigi chronos[28335]: #011at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Aug 21 09:46:02 luigi chronos[28335]: #011at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 21 09:46:02 luigi chronos[28335]: #011at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 21 09:46:02 luigi chronos[28335]: #011at java.lang.reflect.Method.invoke(Method.java:498)
Aug 21 09:46:02 luigi chronos[28335]: #011at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37)
Aug 21 09:46:02 luigi chronos[28335]: #011at com.sun.proxy.$Proxy30.statusUpdate(Unknown Source)

After that it starts leaking file descriptor until crashing with "too many open files" error.
I'm not sure this job_name index actually has a correct name, it looks like the default kwarg value when no job_name is set.

Anyway, I added some try/catch around CQL queries and ignored the specific exception (which states IF NOT EXISTS so I don't think it supposed to fail with a duplicate error anyway).

Crappy workaround PR incoming, at least it makes Chronos works again for me.

Regards, Adam.