crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reading excel file in Azure Databricks

grajee-everest opened this issue · comments

I'm tried to use spark-excel in Azure Databricks but I seem to be be running into an error. I earlier tried the same using SQLServer Big Data Cluster but I was unable to.

Current Behavior
I'm getting an error java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

image

I loaded first the Maven Coordinates and got the error. I later followed the link and loaded the jar files and yet got the same error as shown in the screenshot.

image

Steps to Reproduce (for bugs)

df = spark.read.format("excel") \
   .option("header", True) \
   .option("inferSchema", True) \
   .load(f"dbfs:/FileStore/tables/users.xls") \
   .withColumn("file_name", input_file_name())

Your Environment

Azure Databricks
image

I would really try to not download and add the JARs manually, but use Maven's package resolution.
What was the error you got there?

I got the same error when I first tried it with Maven Coordinates as in the screenshot below. Seeing this error, I went through the dependencies at the link and manually loaded the jar files hoping that it would help. But it did not.

image

image

Here is the error that got generated:


Py4JJavaError Traceback (most recent call last)
in
1 #sampleDataFilePath = "dbfs:/FileStore/tables/users.xls"
2
----> 3 df = spark.read.format("excel")
4 .option("header", True)
5 .option("inferSchema", True) \

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
202 self.options(**options)
203 if isinstance(path, str):
--> 204 return self._df(self._jreader.load(path))
205 elif path is not None:
206 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o307.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.(UnsynchronizedByteArrayOutputStream.java:51)
at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:69)
at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:42)
at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)

If it helps these are the jar files that I see running for the session:

%scala
spark.sparkContext.listJars.foreach(println)

spark://xx.xxx.xx.xx:40525/jars/addedFile2184961893124998763poi_shared_strings_2_2_3-55036.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile8325949175049880530poi_5_1_0-6eaa4.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile3805277380370442712spoiwo_2_12_2_0_0-5d426.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile4821020784640732815commons_text_1_9-9ec33.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile6096385456097086834commons_collections4_4_4-86bd5.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile5503460718089690954poi_ooxml_5_1_0-dcd47.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile1801717094295843813commons_compress_1_21-ae1b7.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile3469926387869248457h2_1_4_200-17cf6.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile7124418099051517404curvesapi_1_6-ef037.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile3524630059114379065slf4j_api_1_7_32-db310.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile621063403924903495SparseBitSet_1_2-c8237.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile5513775878198382075commons_io_2_11_0-998c5.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile6021795642522535665spark_excel_2_12_3_1_2_0_15_1-54852.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile2561448775843921624poi_ooxml_lite_5_1_0-a9fef.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile1605810761903966851commons_lang3_3_11-82b59.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile2616706435049414994commons_codec_1_15-3e3d3.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile3670030969644712160log4j_api_2_14_1-5a13d.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile6859359805595503404xmlbeans_5_0_2-db545.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile5420236778608197626scala_xml_2_12_2_0_0-e8c94.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile2437294818883127996commons_math3_3_6_1-876c0.jar
spark://xx.xxx.xx.xx:40525/jars/addedFile3167668463888463121excel_streaming_reader_3_2_3-b4c68.jar

FYI - I was able to get the elastacloud module working

Wow, I didn't even know that project. Thanks for pointing it out!

I would like to still get this working on Databricks and SQLServer BDC. Note that it is not working in Databricks for few others as well.

I am also facing similar issue. Is it resolved?
java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

FWIW I'm getting the exact same error: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12).
Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1

Any help would be greatly appreciated @nightscape

Hello Guys, I am also facing the same issue. Is there any solution or did someone found any alternative method?

@grajee-everest @KanakVantaku @soleyjh @MounikaKolisetty could one of you try if changing https://github.com/crealytics/spark-excel/blob/main/build.sbt#L32-L37 to

shadeRenames ++= Seq(
  "org.apache.poi.**" -> "shadeio.poi.@1",
  "spoiwo.**" -> "shadeio.spoiwo.@1",
  "com.github.pjfanning.**" -> "shadeio.pjfanning.@1",
  "org.apache.commons.io.**" -> "shadeio.commons.io.@1",
  "org.apache.commons.compress.**" -> "shadeio.commons.compress.@1"
)

resolves this problem?

You would need to sbt publishLocal a version of the JAR that you upload to Databricks then.

Hello @nightscape ,
What does sbt publishLocal means? Can you please explain me in detail?

Hi @MounikaKolisetty,

SBT is the Scala (or Simple) Build Tool.
You can get instructions on how to install it here: https://www.scala-sbt.org/
Once you have it installed, you should be able to run

cd /path/to/spark-excel
# Make the changes from above
sbt publishLocal

This should start building the project and copy the generated JAR files to a path like ~/.iyv2/.../spark-excel...jar.
You can take the JAR from there and try to upload and use it in Databricks.

same error here : java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12).
Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1

Any solution yet?

commented

Same error.

Azure Databricks: 9.0 (includes Apache Spark 3.1.2, Scala 2.12).
Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.16.0

@ciaeric @Dunehub if possible, please try my proposal in #467 (comment)

I am also facing same issue on Azure Databricks and looking for possible solutions.
Adding dependencies as per attachment but not working

image

Once the build here finishes successfully, you can try version 0.16.1-pre1:
https://github.com/crealytics/spark-excel/actions/runs/1607777770

@ciaeric @Dunehub @spaw6065 @MounikaKolisetty @grajee-everest
Please provide feedback here if the shading worked.
The change is still on a branch, so if you don't provide feedback, it won't get merged and won't be part of the next release.

To me it still didn't worked

Steps followed :-

  1. Added jar from dbfs
    image

  2. executed sample code
    image

Error trace
NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748)

Any luck i am facing the same issue. with latest jar com.crealytics:spark-excel_2.13:3.2.0_0.16.1-pre1.
Py4JJavaError Traceback (most recent call last)
in
20 # Do not load the kpi entries into the entry dataframe it's automated according to the csv file stored in GroupFunctions/Daily_Management/RAW/BULKUPLOADPA/DF CSV Files/Automated KPIs.csv
21 # Reading the Entries data from the Excel file
---> 22 DF_entry = spark.read.format("com.crealytics.spark.excel")
23 .option("header", "true")
24 .option("dataAddress" , "'" + sheetname + "'"+"!A1") \

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
202 self.options(**options)
203 if isinstance(path, str):
--> 204 return self._df(self._jreader.load(path))
205 elif path is not None:
206 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o2609.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.(UnsynchronizedByteArrayOutputStream.java:51)
at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55)
at scala.Option.fold(Option.scala:251)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)
at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172)
at scala.Option.getOrElse(Option.scala:189)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171)
at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:36)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)

I've also faced this error? is there any solution? Is there another way to read an xls file on Databricks?

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException

Getting the same error in aws glue with Glue 3.0 Spark 3.1 version - using pyspark.

I'm looking for the new version as I want to utilize the dynamic partitioning to create multiple excel files in parallel

FYI, getting same error even with scala - AWS Glue 3.0 Scala 2 Spark 3.1

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

This clue works for me!!

Thanks

but I need the dynamic partitioning feature, which is not available in the older version.

As I don't have time to look into this, your best option is to try different versions of shading deps here.
Once you have found a version that works on AWS / Azure Databricks, I'd be happy to do another pre-release and get it merged if it works for everyone.

This

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

I was facing the same issue and this worked. Thank you

This

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

I was facing the same issue and this worked. Thank you

This worked for me .Thank you .

Can you give v0.16.1-pre2 a try?

Seem GitHub action for v0.16.1-pre2 has failed

Right. I'm not sure where the GPG problem is coming from and if it has anything to do with the rather small changes I made to build.sbt.

I had a lot of trouble running spark-excel, because of incompatible dependencies. Finally with these library versions I was able to write and read excel! I share for you.
versiones_librerias_excel

Im experiencing the same issue.
Im using the following versions:
Scala: 2.1.2
spark-excel: 0.14.0
Spark: 3.1.2

Ive tried different versions such as:
0.13.1
0.13.4
0.14.0
0.15.0
0.16.1-pre1

Has anybody found a solution to this?
image

As I don't have time to look into this, your best option is to try different versions of shading deps here. Once you have found a version that works on AWS / Azure Databricks, I'd be happy to do another pre-release and get it merged if it works for everyone.

Hi,

I took "pre2" code and built a local jar file and deployed in data bricks. Now I am not getting "commons-io" errors.
But now, I am getting a different error.

IOException: Your InputStream was neither an OLE2 stream, nor an OOXML stream or you haven't provide the poi-ooxml*.jar in the classpath/modulepath - FileMagic: OOXML, having providers: []
at shadeio.poi.ss.usermodel.WorkbookFactory.wp(WorkbookFactory.java:334)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:224)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)

It seems, class loader is not loading the providers?

Any tips?

Vinod

This might be fixed by #513. Can you merge that into your local branch and give it a try? Unfortunately Github Actions are failing because of some GPG error and I haven't had time to look into it...

While all the other versions were failing, I was able to install "com.crealytics:spark-excel_2.13:3.2.0_0.16.1-pre1"

I get the following error when I try to read an excel file:
ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.crealytics.spark.v2.excel.ExcelDataSource could not be instantiated

Any ideas?

@ymopurpg yes, that's probably what was fixed in #513. Unfortunately the build is currently broken, and I don't have time to look into that...

I experienced the same issue as @grajee-everest mentioned. The following packages version helped me:
spark-submit ... --packages com.crealytics:spark-excel_2.12:0.14.0,commons-io:commons-io:2.11.0 ...

I gave it another try in 0.16.5-pre1. Can someone try that out?

@nightscape - I really appreciate it if you please provide the full version (com.crealytics:spark-excel_2.12:0.16.5-pre1 or something else?). I tried so many versions in the past few days and don't know which one to use.

Tried v0.16.5-pre1 this morning and getting the following error...

Py4JJavaError: An error occurred while calling o228.load.
: java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.util.IOUtils
	at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
	at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:111)
	at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:127)
	at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
	at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
	at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

I gave it another try in 0.16.5-pre1. Can someone try that out?

I had the same issue described above but upgrading to com.crealytics:spark-excel_2.12:3.1.2_0.16.5-pre1 resolved the issue for me and it's now working for me. I have no other custom libraries installed on the databricks cluster other than com.crealytics:spark-excel_2.12:3.1.2_0.16.5-pre1

Can everybody try 0.16.5-pre2 and report back here please?

I just tried com.crealytics:spark-excel_2.12:3.2.1_0.16.5-pre2 on 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) and get the following error:

val existing = spark.read.format("excel").option("header", "true").load("example.xlsx")
display(existing)

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.75.73.234 executor 0): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
	at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423)
	at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75)
	at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94)
	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1644)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2984)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2931)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2925)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2925)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1345)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1345)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1345)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3193)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3134)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3122)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1107)
	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2561)
	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:266)
	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:276)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:81)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:87)
	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75)
	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62)
	at org.apache.spark.sql.execution.ResultCacheManager.collectResult$1(ResultCacheManager.scala:587)
	at org.apache.spark.sql.execution.ResultCacheManager.computeResult(ResultCacheManager.scala:596)
	at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:542)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:541)
	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:438)
	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:417)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:422)
	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3132)
	at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3123)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3930)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$6(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:342)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:153)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:115)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:292)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3928)
	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3122)
	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:268)
	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:102)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$3(ScalaDriverLocal.scala:345)
	at scala.Option.map(Option.scala:230)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$1(ScalaDriverLocal.scala:325)
	at scala.Option.map(Option.scala:230)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal.getResultBufferInternal(ScalaDriverLocal.scala:289)
	at com.databricks.backend.daemon.driver.DriverLocal.getResultBuffer(DriverLocal.scala:712)
	at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:267)
	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:602)
	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212)
	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261)
	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60)
	at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:579)
	at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
	at scala.util.Try$.apply(Try.scala:213)
	at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
	at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
	at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
	at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
	at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
	at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
	at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423)
	at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75)
	at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99)
	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94)
	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1644)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

@alexjbush strange, that looks like the Spark version doesn't match the one from spark-excel, but from your description it clearly should...
Maybe there is sth. wrong with our cross-publishing build.sbt.

commented

Add the spark-excel as maven dependency instead of jar, this resolved my issue.

image

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

This worked for me too. Thanks man.
cluster spec ---> DBR 10.4 LTS | Spark 3.2.1 | Scala 2.12

Issue seems to be resolved with com.crealytics:spark-excel_2.12:3.2.1_0.17.1

Cluster specs:
Apache Spark 3.2.1
Scala 2.12

@akshaysangma does it work with newer versions as well?

Issue seems to be resolved with com.crealytics:spark-excel_2.12:3.2.1_0.17.1

Cluster specs: Apache Spark 3.2.1 Scala 2.12

I'm so appreciative of your support

com.crealytics:spark-excel_2.12:3.2.1_0.17.1 this version may resolve the issue

I'm also facing same issue in #GCP #Databricks. my requirement read XLS format file through #PySpark

I have tried all below Jar version, but no of them seems to work

  1. com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1
  2. com.crealytics:spark-excel_2.12:3.5.0_0.20.3
  3. com.crealytics:spark-excel_2.12:3.2.4_0.20.3
  4. com.crealytics:spark-excel_2.12:3.2.4_0.19.0

Bing Chat or Copilot suggested below solution.
image

I'm able to find the issue with my XLS file, the file was export from a website in HTML format with XLS extension.
Due to which I was able to open it in MS Excel but unable to parse it in this library, due to which it's throwing this error.

Is this issue resolved? and also is there any failfast mode like CSV.. which supports?

Is this issue resolved? and also is there any failfast mode like CSV.. which supports?