facing error when parsing yaml using scala
VIKCT001 opened this issue · comments
I am facing below issue while running a scala code on dataproc cluster. Code is running fine at local.
[Exception in thread "main" java.lang.NoSuchMethodError: org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;)V]
`object mytestmain {
def main(args: Array[String]): Unit = {
println("In main function")
println("reading from gcs bucket")
//val storage = StorageOptions.getDefaultInstance.getService
//val my_blob = storage.get(BlobId.of("test-bucket", "job-configs/test.yml"))
// val filecontent = new String(my_blob.getContent(), StandardCharsets.UTF_8)
val config = """file_location: test-file
|big_query_dataset: test-dataset
|big_query_tablename: test-table
""".stripMargin
val classobj = new IngestionData()
classobj.printYamlfiledata(config)
}
}`
`package com.test.processing.jobs
import net.jcazevedo.moultingyaml._
import com.test.processing.conf.DatasetConfiguration
object ReadYamlConfiguration extends DefaultYamlProtocol {
implicit object datasetConfFormat extends YamlFormat[DatasetConfiguration] {
def write(obj: DatasetConfiguration)=YamlObject (
YamlString("file_location") -> YamlString(obj.file_location),
YamlString("big_query_dataset") -> YamlString(obj.big_query_dataset),
YamlString("big_query_tablename") -> YamlString(obj.big_query_tablename)
)
println("I am in read datasetConfFormat object ")
def read(value: YamlValue) = {
value.asYamlObject.getFields(
YamlString("file_location"),
YamlString("big_query_dataset"),
YamlString("big_query_tablename")) match {
case Seq(
YamlString(file_location),
YamlString(big_query_dataset),
YamlString(big_query_tablename)) =>
new DatasetConfiguration(file_location, big_query_dataset, big_query_tablename)
case _ => deserializationError("Data configs expected")
}
}
implicit val YamlDatasetConfigurationfFormat = yamlFormat3(DatasetConfiguration)
}
}`
`import net.jcazevedo.moultingyaml._
import com.test.processing.jobs.ReadYamlConfiguration._
class IngestionData {
def printYamlfiledata(filedata: String) = {
println("I am in readYamlfiledata method")
val myObj = filedata.parseYaml.convertTo[DatasetConfiguration]
println("file name is :" + myObj.file_location)
println("dataset name is:" +myObj.big_query_dataset)
println("Table name is:" + myObj.big_query_tablename)
}
}`
case class DatasetConfiguration ( file_location: String, big_query_dataset: String, big_query_tablename: String )
It's failing when I am reading yaml file from bucket or even when I have hardcoded the file as an Input. running fine at local
@VIKCT001 I am facing the exact same error, I have a project with Scala 2.11 and Spark 2.4, after implementing YAML parsing everything worked flawlessly via sbt test
but after building a fat jar with sbt assembly
and running spark-submit
locally I am getting the method not found exception. After unpacking the jar, I can confirm that the org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;) is there.
My hunch is something bad is happening in assemblyMergeStrategy as it excludes pom.properties
and and pom.xml
of SnakeYAML because it is under META-INF
folder which is expected behavior. @VIKCT001 could you share your build.sbt
?
@jcazevedo Do you have any hunches? Have you encountered such a scenario?
The reason for this issue to occur is because Apache Spark (2.4
in my case) uses SnakeYAML 1.15
which gets picked up first by the class loader when running the project with spark-submit
and SnakeYAML 1.26
used by moultingyaml
gets ignored.
Solution is to shade SnakeYAML in your build.sbt
like that:
assemblyShadeRules in assembly := Seq(
// fixes the problem when running from spark-submit an older version of SnakeYAML is being used
ShadeRule.rename("org.yaml.snakeyaml.**" -> "org.yaml.snakeyamlShaded@1").inAll
)