Cannot use V2 for streaming read
james-miles-ccy opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
I am trying to read via V2 in streaming way, with no success. I was wondering if there is anything I can do to get this working?
the code is below:
df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)
display(df)
the exception error is given below:
java.lang.UnsupportedOperationException: ExcelFileFormat as fallback format for V2 supports writing only
Expected Behavior
I was hoping it would generate a dataframe.
Steps To Reproduce
df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)
display(df)
Environment
- Spark version:3.3.1
- Spark-Excel version:2.12:3.3.1_0.18.7
- OS:Windows
- Cluster environment:Databricks
Anything else?
No response
The documentation reads like this is only supported for a few specific file formats:
https://docs.databricks.com/ingestion/auto-loader/options.html#file-format-options
Not sure if they are hard-coded somewhere, or one would need to implement a special API.
I don't have time to look into this, but if you're willing to give it a try yourself I can give you some guidance.
We have gotten this to work for other custom file formats with fixed schema. I wonder if we can apply a similar approach here while supporting provided schemas or inferred schemas.