add startingOffsets, endingOffsets, quitWhenDone parameters to StructuredStream

Question

add startingOffsets, endingOffsets, quitWhenDone parameters to StructuredStream

oalam opened this issue 5 years ago · comments

Thomas Bailet commented 5 years ago

Spark SQL Kafka as documented here can take startingOffsets and endingOffsets parameters

this could be useful to start a macro-batch stream from data stored into Kafka and end when done !!

use case : timeseries analytics (chunking)

https://dataengi.com/2019/06/06/spark-structured-streaming/

// Subscribe to multiple topics, specifying explicit Kafka offsets
val df = spark
  .read
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1,topic2")
  .option("startingOffsets", """{"topic1":{"0":23,"1":-2},"topic2":{"0":-2}}""")
  .option("endingOffsets", """{"topic1":{"0":50,"1":-1},"topic2":{"0":-1}}""")
  .load()

Séguin-Henry Grégoire · Answer 1 · Tue Oct 26 2021 20:07:24 GMT+0800 (China Standard Time)

I think this is now supporterd (more or less)