pinterest / DoctorK

DoctorK is a service for Kafka cluster auto healing and workload balancing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DrKafkaStart failed, NullPointerException getProcessingStartOffsets

yhzhao opened this issue · comments

java -server -cp lib/:doctorkafka-0.2.4.3.jar com.pinterest.doctorkafka.DoctorKafkaMain server dropwizard.yml
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
INFO [2019-01-17 21:15:44,236] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: /
INFO [2019-01-17 21:15:44,239] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: /
INFO [2019-01-17 21:15:44,239] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: assets for path /

INFO [2019-01-17 21:15:44,416] com.pinterest.doctorkafka.util.ZookeeperClient: Initialize curator with zkurl:df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/
INFO [2019-01-17 21:15:44,487] org.apache.curator.utils.Compatibility: Running in ZooKeeper 3.4.x compatibility mode
INFO [2019-01-17 21:15:44,513] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:host.name=ip-10-129-102-236.us-east-1.aws.athenahealth.com
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.8.0_191
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=lib/kafkastats-0.2.4.3.jar:lib/commons-cli-1.3.1.jar:lib/avro-1.8.2.jar:lib/jackson-core-asl-1.9.13.jar:lib/jackson-mapper-asl-1.9.13.jar:lib/paranamer-2.7.jar:lib/snappy-java-1.1.1.3.jar:lib/commons-compress-1.8.1.jar:lib/xz-1.5.jar:lib/slf4j-api-1.7.25.jar:lib/kafka_2.12-1.1.0.jar:lib/kafka-clients-1.1.0.jar:lib/lz4-java-1.4.jar:lib/jackson-databind-2.9.8.jar:lib/jackson-annotations-2.9.8.jar:lib/jackson-core-2.9.8.jar:lib/jopt-simple-5.0.4.jar:lib/metrics-core-2.2.0.jar:lib/scala-library-2.12.7.jar:lib/scala-reflect-2.12.4.jar:lib/scala-logging_2.12-3.7.2.jar:lib/zkclient-0.10.jar:lib/zookeeper-3.4.10.jar:lib/log4j-1.2.16.jar:lib/jline-0.9.94.jar:lib/junit-3.8.1.jar:lib/netty-3.10.5.Final.jar:lib/log4j-core-2.11.1.jar:lib/log4j-api-2.11.1.jar:lib/ostrich_2.12-9.27.0.jar:lib/util-core_2.12-6.43.0.jar:lib/util-function_2.12-6.43.0.jar:lib/scala-parser-combinators_2.12-1.0.4.jar:lib/util-eval_2.12-6.43.0.jar:lib/scala-compiler-2.12.1.jar:lib/scala-xml_2.12-1.0.6.jar:lib/util-logging_2.12-6.43.0.jar:lib/util-app_2.12-6.43.0.jar:lib/util-registry_2.12-6.43.0.jar:lib/util-stats_2.12-6.43.0.jar:lib/util-lint_2.12-6.43.0.jar:lib/caffeine-2.3.4.jar:lib/jsr305-1.3.9.jar:lib/util-jvm_2.12-6.43.0.jar:lib/jackson-module-scala_2.12-2.9.7.jar:lib/jackson-module-paranamer-2.9.7.jar:lib/guava-23.0.jar:lib/error_prone_annotations-2.0.18.jar:lib/j2objc-annotations-1.1.jar:lib/animal-sniffer-annotations-1.14.jar:lib/commons-lang3-3.6.jar:lib/commons-configuration2-2.4.jar:lib/commons-text-1.6.jar:lib/commons-logging-1.2.jar:lib/commons-math3-3.6.1.jar:lib/commons-beanutils-1.9.3.jar:lib/commons-collections-3.2.2.jar:lib/commons-validator-1.6.jar:lib/commons-digester-1.8.1.jar:lib/curator-framework-4.0.1.jar:lib/curator-client-4.0.1.jar:lib/metrics-core-3.2.3.jar:lib/gson-2.8.2.jar:lib/javax.annotation-api-1.3.2.jar:lib/jaxws-api-2.3.1.jar:lib/jaxb-api-2.3.1.jar:lib/javax.activation-api-1.2.0.jar:lib/javax.xml.soap-api-1.4.0.jar:lib/dropwizard-assets-1.3.8.jar:lib/dropwizard-core-1.3.8.jar:lib/dropwizard-util-1.3.8.jar:lib/joda-time-2.9.9.jar:lib/dropwizard-jackson-1.3.8.jar:lib/jackson-datatype-guava-2.9.6.jar:lib/jackson-datatype-jsr310-2.9.6.jar:lib/jackson-datatype-jdk8-2.9.6.jar:lib/jackson-module-parameter-names-2.9.6.jar:lib/jackson-module-afterburner-2.9.6.jar:lib/jackson-datatype-joda-2.9.6.jar:lib/dropwizard-validation-1.3.8.jar:lib/hibernate-validator-5.4.2.Final.jar:lib/validation-api-1.1.0.Final.jar:lib/jboss-logging-3.3.0.Final.jar:lib/classmate-1.3.1.jar:lib/javax.el-3.0.0.jar:lib/javassist-3.22.0-GA.jar:lib/dropwizard-configuration-1.3.8.jar:lib/jackson-dataformat-yaml-2.9.6.jar:lib/snakeyaml-1.18.jar:lib/dropwizard-logging-1.3.8.jar:lib/metrics-logback-4.0.2.jar:lib/jul-to-slf4j-1.7.25.jar:lib/logback-core-1.2.3.jar:lib/logback-classic-1.2.3.jar:lib/jcl-over-slf4j-1.7.25.jar:lib/jetty-util-9.4.14.v20181114.jar:lib/dropwizard-metrics-1.3.8.jar:lib/dropwizard-lifecycle-1.3.8.jar:lib/javax.servlet-api-3.1.0.jar:lib/jetty-server-9.4.14.v20181114.jar:lib/jetty-http-9.4.14.v20181114.jar:lib/jetty-io-9.4.14.v20181114.jar:lib/dropwizard-jersey-1.3.8.jar:lib/jersey-server-2.25.1.jar:lib/jersey-common-2.25.1.jar:lib/javax.ws.rs-api-2.0.1.jar:lib/jersey-guava-2.25.1.jar:lib/hk2-api-2.5.0-b32.jar:lib/hk2-utils-2.5.0-b32.jar:lib/aopalliance-repackaged-2.5.0-b32.jar:lib/javax.inject-2.5.0-b32.jar:lib/hk2-locator-2.5.0-b32.jar:lib/osgi-resource-locator-1.0.1.jar:lib/jersey-client-2.25.1.jar:lib/jersey-media-jaxb-2.25.1.jar:lib/jersey-metainf-services-2.25.1.jar:lib/jersey-bean-validation-2.25.1.jar:lib/metrics-jersey2-4.0.2.jar:lib/metrics-annotation-4.0.2.jar:lib/jackson-jaxrs-json-provider-2.9.6.jar:lib/jackson-jaxrs-base-2.9.6.jar:lib/jackson-module-jaxb-annotations-2.9.6.jar:lib/jersey-container-servlet-2.25.1.jar:lib/jersey-container-servlet-core-2.25.1.jar:lib/jetty-webapp-9.4.14.v20181114.jar:lib/jetty-xml-9.4.14.v20181114.jar:lib/jetty-servlet-9.4.14.v20181114.jar:lib/jetty-security-9.4.14.v20181114.jar:lib/jetty-continuation-9.4.14.v20181114.jar:lib/dropwizard-servlets-1.3.8.jar:lib/dropwizard-jetty-1.3.8.jar:lib/metrics-jetty9-4.0.2.jar:lib/jetty-servlets-9.4.14.v20181114.jar:lib/metrics-jvm-4.0.2.jar:lib/metrics-jmx-4.0.2.jar:lib/metrics-servlets-4.0.2.jar:lib/metrics-healthchecks-4.0.2.jar:lib/metrics-json-4.0.2.jar:lib/profiler-1.0.2.jar:lib/dropwizard-request-logging-1.3.8.jar:lib/logback-access-1.2.3.jar:lib/argparse4j-0.8.1.jar:lib/jetty-setuid-java-1.0.3.jar:lib/dropwizard-auth-1.3.8.jar:doctorkafka-0.2.4.3.jar
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.3.2.el7.x86_64
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.name=yuzhao
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/yuzhao
INFO [2019-01-17 21:15:44,524] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/yuzhao
INFO [2019-01-17 21:15:44,525] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/ sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@77681ce4
INFO [2019-01-17 21:15:44,619] org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.129.107.71/10.129.107.71:2181. Will not attempt to authenticate using SASL (unknown error)
INFO [2019-01-17 21:15:44,627] org.apache.curator.framework.imps.CuratorFrameworkImpl: Default schema
INFO [2019-01-17 21:15:44,636] org.apache.zookeeper.ClientCnxn: Socket connection established to 10.129.107.71/10.129.107.71:2181, initiating session
INFO [2019-01-17 21:15:44,648] io.dropwizard.server.ServerFactory: Starting DoctorKafkaMain
INFO [2019-01-17 21:15:44,650] org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.129.107.71/10.129.107.71:2181, sessionid = 0x1684d33f9000020, negotiated timeout = 40000
INFO [2019-01-17 21:15:44,669] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
INFO [2019-01-17 21:15:44,845] org.eclipse.jetty.setuid.SetUIDListener: Opened application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:15:44,847] org.eclipse.jetty.server.Server: jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: c4550056e785fb5665914545889f21dc136ad9e6; jvm 1.8.0_191-b12
INFO [2019-01-17 21:15:45,021] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=df_zookeeper_dfops_743_1_development_datafabric_kafka_aws_us_ea.aws.athenahealth.com:2181/zookeeper_datafabric_development_dfops_743_1_kafka_aws_us_east_1_df_zookeeper sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@277a8c9
INFO [2019-01-17 21:15:45,022] org.I0Itec.zkclient.ZkEventThread: Starting ZkClient event thread.
INFO [2019-01-17 21:15:45,024] org.I0Itec.zkclient.ZkClient: Waiting for keeper state SyncConnected
INFO [2019-01-17 21:15:45,025] org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.129.107.71/10.129.107.71:2181. Will not attempt to authenticate using SASL (unknown error)
INFO [2019-01-17 21:15:45,027] org.apache.zookeeper.ClientCnxn: Socket connection established to 10.129.107.71/10.129.107.71:2181, initiating session
INFO [2019-01-17 21:15:45,033] org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.129.107.71/10.129.107.71:2181, sessionid = 0x1684d33f9000021, negotiated timeout = 30000
INFO [2019-01-17 21:15:45,040] org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
INFO [2019-01-17 21:15:45,055] kafka.utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean
INFO [2019-01-17 21:15:45,419] org.apache.kafka.clients.consumer.ConsumerConfig: ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [10.129.105.28:9092, 10.129.104.201:9092, 10.129.103.217:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = doctorkafka
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 4194304
max.poll.interval.ms = 1
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer

INFO [2019-01-17 21:15:45,486] org.apache.kafka.common.utils.AppInfoParser: Kafka version : 1.1.0
INFO [2019-01-17 21:15:45,486] org.apache.kafka.common.utils.AppInfoParser: Kafka commitId : fdcf75ea326b8e07
INFO [2019-01-17 21:15:45,625] org.apache.kafka.clients.Metadata: Cluster ID: M4N2fNprQYKCrKhKzAH4zQ
21:15:45.676 [pool-6-thread-1] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - DoctorKafka start failed
java.lang.NullPointerException: null
at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.getProcessingStartOffsets(ReplicaStatsManager.java:193) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.readPastReplicaStats(ReplicaStatsManager.java:216) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafka.start(DoctorKafka.java:51) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafkaMain.lambda$run$0(DoctorKafkaMain.java:68) ~[doctorkafka-0.2.4.3.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
INFO [2019-01-17 21:15:45,782] io.dropwizard.jersey.DropwizardResourceConfig: The following paths were found for the configured resources:

GET     /api/cluster (com.pinterest.doctorkafka.api.ClusterApi)
DELETE  /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
GET     /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
PUT     /api/cluster/{clusterName}/admin/maintenance (com.pinterest.doctorkafka.api.MaintenanceApi)
GET     /api/cluster/{clusterName}/broker (com.pinterest.doctorkafka.api.BrokerApi)

INFO [2019-01-17 21:15:45,784] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@1961d75a{/,null,AVAILABLE}
INFO [2019-01-17 21:15:45,797] org.eclipse.jetty.server.AbstractConnector: Started application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:15:45,797] org.eclipse.jetty.server.Server: Started @3572ms

^C
21:17:16.003 [Thread-2] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - Failure in stopping operator
java.lang.NullPointerException: null
at com.pinterest.doctorkafka.DoctorKafka.stop(DoctorKafka.java:78) ~[doctorkafka-0.2.4.3.jar:?]
at com.pinterest.doctorkafka.DoctorKafkaMain$OperatorCleanupThread.run(DoctorKafkaMain.java:153) [doctorkafka-0.2.4.3.jar:?]
INFO [2019-01-17 21:17:16,006] org.eclipse.jetty.server.AbstractConnector: Stopped application@6f044c58{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
INFO [2019-01-17 21:17:16,012] org.eclipse.jetty.server.handler.ContextHandler: Stopped i.d.j.MutableServletContextHandler@1961d75a{/,null,UNAVAILABLE}

And brokerstats topic has nothing in. Does it mean the kafkastats process running on the brokers are not sending anything to the topic?

can you tell the message format version that your cluster uses? Based on the exception, doctorkafka failed to find the starting offsets of kafkastats topic. you can use kafka command line tool kafka-console-consumer.sh to check if there are messages in the kafkastats topic.

https://github.com/pinterest/doctorkafka/blob/master/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java#L190

There is no message in that topic when using that command. What is the expected version of kafka?2.11-2.1.0 seems to work fine (at least I don't see this problem.)

2.11 - 1.1.0 seems to have problem.

Actually, 0.2.4 version of kafkastas does publish message to brokerstats topic. 0.2.4.3 does not. Feels like a version miss-match.

I'm seeing the same issue. I've got a half dozen Kafka clusters with all their brokers running kafkastats-0.2.4.3. I've verified that they are writing data into the brokerstats topic. I've even truncated data to the last hour to ensure that any old stats data (written by an older kafkastats client) has been purged. But I, too, am getting the traceback in my log:

2019-01-23 20:53:18.892 [pool-6-thread-1] ERROR com.pinterest.doctorkafka.DoctorKafkaMain - DoctorKafka start failed
java.lang.NullPointerException: null
	at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.getProcessingStartOffsets(ReplicaStatsManager.java:193) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.replicastats.ReplicaStatsManager.readPastReplicaStats(ReplicaStatsManager.java:216) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.DoctorKafka.start(DoctorKafka.java:51) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at com.pinterest.doctorkafka.DoctorKafkaMain.lambda$run$0(DoctorKafkaMain.java:68) ~[doctorkafka-0.2.4.3-jar-with-dependencies.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

If I wrap ReplicaStatsManager.java:193 in a try/except bit, DoctorKafka starts up.

commented

@yhzhao could you please check the Kafka message format of the broker stats topic? We are expecting at least 0.10.0 for Dr.Kafka 0.2.4.3 above. Starting 0.2.4.3 we switched to using the Kafka timeindex to find past offsets for broker stats, this helps improve the Dr.Kafka startup performance.
The expected version of Kafka message format 0.10.0 or higher.

From the javadoc of kafkaConsumer.offsetsForTimes()

If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps, null will be returned for that partition.

In my case, that has been set to log.message.format.version=0.11.0, so that should be good.

Also, I just set up a brand new doctorkafka installation, all 0.2.4.3, and am having the same issue here. I'm suspecting that the kafkastats daemons are occasionally sending insufficiently complete data that makes doctorkafka sad.

@BrianGallew Thanks for the info! We will look into the issue and fix it. cc @ambud

OK, the following diff will reliably 'fix' the problem, but of course it's just a bandage.

    for (Entry<TopicPartition, OffsetAndTimestamp> entry : offsetsForTimes.entrySet()) {
      if (entry.getValue() == null) {
	LOG.info("Faking a zero for {}", entry.getKey());
	partitionMap.put(entry.getKey(), 0L);
      } else {
	partitionMap.put(entry.getKey(), entry.getValue().offset());
      }
    }

I'm not sure, but I think the issue is that partitions which have never been written end up have no offset data, and doctorkafka really wants it to be there.

And, yeah, that's not a diff, that's just lines 192-199 of ReplicaStatsManager.java. Here's an actual diff.

diff --git a/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java b/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
index c680125..eee5b47 100644
--- a/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
+++ b/drkafka/src/main/java/com/pinterest/doctorkafka/replicastats/ReplicaStatsManager.java
@@ -192,3 +192,8 @@ public class ReplicaStatsManager {
     for (Entry<TopicPartition, OffsetAndTimestamp> entry : offsetsForTimes.entrySet()) {
-      partitionMap.put(entry.getKey(), entry.getValue().offset());
+      if (entry.getValue() == null) {
+       LOG.info("Faking a zero for {}", entry.getKey());
+       partitionMap.put(entry.getKey(), 0L);
+      } else {
+       partitionMap.put(entry.getKey(), entry.getValue().offset());
+      }
     }
commented

@BrianGallew could you please open a PR with the above diff?

The issue to investigate would be why are there no stats written to certain partitions of the stats topic.

I can (and will!), but are we sure it doesn't just mask a real problem?

commented

I am facing the same problem. Is this issue closed?
Should I just mask the problem as it is in diff above?