AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

spark3 | scala 2.12 | jackson ScalaObjectMapper | NoSuchMethodError

kennydataml opened this issue · comments

commented

Describe the bug

Loading dat file produces a NoSuchMethodError

To Reproduce

sc.addFile(copybook)
sc.addFile(dat)
df = spark.read.option.format("cobol")\
  .option("copybook", SparkFiles.get(copybook))\
  .load(SparkFiles.get(dat))

Expected behaviour

Should load successfully

Screenshots

20/04/28 19:09:53 INFO DefaultSource: Cobrix 'spark-cobol' build 2.0.7 (2020-04-14T12:00:03)
Traceback (most recent call last):
  File "/tmp/spark-0a0a39a6-a906-4b6c-af92-2ece5fcb2776/k8s_spark_cobol_adls.py", line 77, in <module>
    .load(SparkFiles.get(datafile1))
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 160, in load
  File "/opt/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in __call__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 98, in deco
  File "/opt/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o58.load.
: java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper.$init$(Lcom/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper;)V
        at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersParser$$anon$1.<init>(CobolParametersParser.scala:499)
        at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersParser$.getOccursMappings(CobolParametersParser.scala:499)
        at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersParser$.parse(CobolParametersParser.scala:215)
        at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:56)
        at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

20/04/28 19:09:53 INFO SparkContext: Invoking stop() from shutdown hook

Additional context

Using spark3 pyspark on Kubernetes. Require to use sc.addFile and SparkFiles.get due to limitations of kubernetes with ADLS2

commented

error is due to conflicting jackson versions with Spark3. Had to pull this Git repo and build the jar with sbt clean package. changed the dependencies.scala to match with spark 3 before building.

Hi,

Thanks for the bug report!

I'm going to reopen it since spark-cobol should support Spark 3 out of the box.

@kenny-bui-slalom, which version did you use to make it work?

commented

spark 3.0.0-preview2 currently uses jackson version 2.10.0

Yeah, tried with 3.0.0-preview2 and had similar issues.
Just changing the scala version to 2.12 fixed it for me even with the jackson version 2.10.3.

I'll add a PR to specifically list the dependency as an override.

@tr11 , Thanks for the fix!

@tr11 , after the explicit jackson dependency is added, spark-cobol is failing to build.

  java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JsonMappingException.<init>(Ljava/io/Closeable;Ljava/lang/String;)V
  at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:61)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:17)
  at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:718)
  at za.co.absa.cobrix.spark.cobol.parameters.CobolParametersParser$.getOccursMappings(CobolParametersParser.scala:498)
  at za.co.absa.cobrix.spark.cobol.parameters.CobolParametersParser$.parse(CobolParametersParser.scala:213)

I've played with shading our own version of Jackson in cobol-parser. It works well, but cobol-parser becomes a 3M artifact (from original 700K). And shading disrupts Maven's dependency management somewhat. So I think I'll just going to write a tiny JSON parser for the OCCURS parameters. It will be tied to the specific structure of the parameters JSON.

Makes sense. Since we already import ANTLR, we can use the grammar at https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4. I can very easily build the visitor for that.

Cool, thanks for the link!

I am getting this same error in Spark-3.0.0 with spark-cobol_2.12-2.0.7

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper.$init$(Lcom/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper;)V

I see this bug is fixed, is that I am missing anything?

spark-cobol 2.0.7 does not have this fix. Try spark-cobol_2.12-2.1.0.

@yruslan Thank you, it worked.