Kafka server shutting down too fast
Vesli opened this issue · comments
I'm trying to do a very simple bootstrap app with Kafka and spark streaming:
Topic1 / Topic2
a Spark streaming application that read from Topic1 and write to Topic2 without doing any transformation.
scalaVersion := "2.12.8"
val sparkVersion = "2.4.2"
...
libraryDependencies += "io.github.embeddedkafka" %% "embedded-kafka" % "2.2.0" % "test"
Read stream:
object ReadKafkaTopic {
def readStream(spark: SparkSession, brokers: String, topic: String): DataFrame = {
spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("subscribe", topic)
.option("startingOffsets", "latest")
.load()
}
}
WriteStream:
object WriteKafkaTopic {
def writeStream(df: DataFrame, brokers: String, topic: String): Unit = {
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.writeStream
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("topic", topic)
.option("checkpointLocation", "/tmp/checkpoint")
.start()
}
}
I wish to test this simple application:
class MainIntegrationTests extends WordSpec with EmbeddedKafka {
"runs with embedded kafka" should {
val spark = SparkSession.builder.master("local[*]").getOrCreate()
val topicIn: String = "in"
val topicOut: String = "out"
"work" in {
implicit val config = EmbeddedKafkaConfig(kafkaPort = 9092)
withRunningKafka {
println("SPARK:")
println(spark)
println("Publishing to topic IN")
publishStringMessageToKafka(topicIn, "message")
println("READING FROM TOPIC")
val df: DataFrame = ReadKafkaTopic.readStream(spark, "127.0.0.1:9092", topicIn)
println("WRITING TO TOPIC")
WriteKafkaTopic.writeStream(df, "127.0.0.1:9092", topicOut)
println("FINALE READING!")
println("RESULT:")
println(EmbeddedKafka.isRunning)
val resultTopic = consumeFirstStringMessageFrom(topicOut)
println("Result TOPIC: " + resultTopic)
assert(resultTopic == "message")
}
}
spark.stop()
}
}
But when I reach the println("READING FROM TOPIC")
I see this in the logs:
READING FROM TOPIC
19/05/13 12:13:08 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/jay/mporium/bootstrap/spark-kafka-to-kafka-utils/spark-warehouse').
19/05/13 12:13:08 INFO SharedState: Warehouse path is 'file:/home/jay/mporium/bootstrap/spark-kafka-to-kafka-utils/spark-warehouse'.
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] shutting down
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] Starting controlled shutdown
19/05/13 12:13:08 INFO KafkaController: [Controller id=0] Shutting down broker 0
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] Controlled shutdown succeeded
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Shutting down
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Stopped
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Shutdown completed
19/05/13 12:13:08 INFO SocketServer: [SocketServer brokerId=0] Stopping socket server request processors
19/05/13 12:13:08 INFO SocketServer: [SocketServer brokerId=0] Stopped socket server request processors
19/05/13 12:13:08 INFO KafkaRequestHandlerPool: [data-plane Kafka Request Handler on Broker 0], shutting down
19/05/13 12:13:08 INFO KafkaRequestHandlerPool: [data-plane Kafka Request Handler on Broker 0], shut down completely
19/05/13 12:13:08 INFO KafkaApis: [KafkaApi-0] Shutdown complete.
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Shutting down
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Stopped
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Shutdown completed
19/05/13 12:13:08 INFO TransactionCoordinator: [TransactionCoordinator id=0] Shutting down.
And of course all this generate an error on Spark
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder'
Do you know why the kafka server is closing this soon?
Or am I doing it completely wrong?
Many thanks
Hi @Vesli, have you tried using localhost:9092
instead of 127.0.0.1?
Also, can you wrap the ReadKafkaTopic.readStream
execution in a try/catch block to see whether or not any exception is thrown?
Hey Francesco,
I actually realised that if I use the spark.stream.awaitTermination() it will solve my problem (spark structure streaming is actually executed in parallel).
I'll close this ticket, after many considerations our team will approach the testing in a different way.