embeddedkafka / embedded-kafka

A library that provides an in-memory Kafka instance to run your tests against.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kafka server shutting down too fast

Vesli opened this issue · comments

I'm trying to do a very simple bootstrap app with Kafka and spark streaming:
Topic1 / Topic2

a Spark streaming application that read from Topic1 and write to Topic2 without doing any transformation.

scalaVersion := "2.12.8"
val sparkVersion = "2.4.2"
...
libraryDependencies += "io.github.embeddedkafka" %% "embedded-kafka" % "2.2.0" % "test"

Read stream:

object ReadKafkaTopic {
  def readStream(spark: SparkSession, brokers: String, topic: String): DataFrame = {
    spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", brokers)
      .option("subscribe", topic)
      .option("startingOffsets", "latest")
      .load()
  }
}

WriteStream:

object WriteKafkaTopic {
  def writeStream(df: DataFrame, brokers: String, topic: String): Unit = {
    df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
      .writeStream
      .format("kafka")
      .option("kafka.bootstrap.servers", brokers)
      .option("topic", topic)
      .option("checkpointLocation", "/tmp/checkpoint")
      .start()
  }
}

I wish to test this simple application:

class MainIntegrationTests extends WordSpec with EmbeddedKafka {

  "runs with embedded kafka" should {
    val spark = SparkSession.builder.master("local[*]").getOrCreate()
    val topicIn: String = "in"
    val topicOut: String = "out"

    "work" in {
      implicit val config = EmbeddedKafkaConfig(kafkaPort = 9092)
      withRunningKafka {
        println("SPARK:")
        println(spark)

        println("Publishing to topic IN")
        publishStringMessageToKafka(topicIn, "message")

        println("READING FROM TOPIC")
        val df: DataFrame = ReadKafkaTopic.readStream(spark, "127.0.0.1:9092", topicIn)

        println("WRITING TO TOPIC")
        WriteKafkaTopic.writeStream(df, "127.0.0.1:9092", topicOut)

        println("FINALE READING!")

        println("RESULT:")
        println(EmbeddedKafka.isRunning)

        val resultTopic = consumeFirstStringMessageFrom(topicOut)
        println("Result TOPIC: " + resultTopic)
        assert(resultTopic == "message")

      }
    }

    spark.stop()
  }
}

But when I reach the println("READING FROM TOPIC") I see this in the logs:

READING FROM TOPIC
19/05/13 12:13:08 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/jay/mporium/bootstrap/spark-kafka-to-kafka-utils/spark-warehouse').
19/05/13 12:13:08 INFO SharedState: Warehouse path is 'file:/home/jay/mporium/bootstrap/spark-kafka-to-kafka-utils/spark-warehouse'.
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] shutting down
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] Starting controlled shutdown
19/05/13 12:13:08 INFO KafkaController: [Controller id=0] Shutting down broker 0
19/05/13 12:13:08 INFO KafkaServer: [KafkaServer id=0] Controlled shutdown succeeded
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Shutting down
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Stopped
19/05/13 12:13:08 INFO ZkNodeChangeNotificationListener$ChangeEventProcessThread: [/config/changes-event-process-thread]: Shutdown completed
19/05/13 12:13:08 INFO SocketServer: [SocketServer brokerId=0] Stopping socket server request processors
19/05/13 12:13:08 INFO SocketServer: [SocketServer brokerId=0] Stopped socket server request processors
19/05/13 12:13:08 INFO KafkaRequestHandlerPool: [data-plane Kafka Request Handler on Broker 0], shutting down
19/05/13 12:13:08 INFO KafkaRequestHandlerPool: [data-plane Kafka Request Handler on Broker 0], shut down completely
19/05/13 12:13:08 INFO KafkaApis: [KafkaApi-0] Shutdown complete.
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Shutting down
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Stopped
19/05/13 12:13:08 INFO DelayedOperationPurgatory$ExpiredOperationReaper: [ExpirationReaper-0-topic]: Shutdown completed
19/05/13 12:13:08 INFO TransactionCoordinator: [TransactionCoordinator id=0] Shutting down.

And of course all this generate an error on Spark
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionStateBuilder'

Do you know why the kafka server is closing this soon?
Or am I doing it completely wrong?

Many thanks

Hi @Vesli, have you tried using localhost:9092 instead of 127.0.0.1?

Also, can you wrap the ReadKafkaTopic.readStream execution in a try/catch block to see whether or not any exception is thrown?

Hey Francesco,

I actually realised that if I use the spark.stream.awaitTermination() it will solve my problem (spark structure streaming is actually executed in parallel).

I'll close this ticket, after many considerations our team will approach the testing in a different way.