add startingOffsets, endingOffsets, quitWhenDone parameters to StructuredStream
oalam opened this issue · comments
Spark SQL Kafka as documented here can take startingOffsets and endingOffsets parameters
this could be useful to start a macro-batch stream from data stored into Kafka and end when done !!
use case : timeseries analytics (chunking)
https://dataengi.com/2019/06/06/spark-structured-streaming/
// Subscribe to multiple topics, specifying explicit Kafka offsets
val df = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1,topic2")
.option("startingOffsets", """{"topic1":{"0":23,"1":-2},"topic2":{"0":-2}}""")
.option("endingOffsets", """{"topic1":{"0":50,"1":-1},"topic2":{"0":-1}}""")
.load()
I think this is now supporterd (more or less)