crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot use V2 for streaming read

james-miles-ccy opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am trying to read via V2 in streaming way, with no success. I was wondering if there is anything I can do to get this working?

the code is below:

df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)

display(df)

the exception error is given below:

java.lang.UnsupportedOperationException: ExcelFileFormat as fallback format for V2 supports writing only

Expected Behavior

I was hoping it would generate a dataframe.

Steps To Reproduce

df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)

display(df)

Environment

- Spark version:3.3.1
- Spark-Excel version:2.12:3.3.1_0.18.7
- OS:Windows
- Cluster environment:Databricks

Anything else?

No response

The documentation reads like this is only supported for a few specific file formats:
https://docs.databricks.com/ingestion/auto-loader/options.html#file-format-options
Not sure if they are hard-coded somewhere, or one would need to implement a special API.
I don't have time to look into this, but if you're willing to give it a try yourself I can give you some guidance.

We have gotten this to work for other custom file formats with fixed schema. I wonder if we can apply a similar approach here while supporting provided schemas or inferred schemas.