jcazevedo / moultingyaml

Scala wrapper for SnakeYAML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

facing error when parsing yaml using scala

VIKCT001 opened this issue · comments

I am facing below issue while running a scala code on dataproc cluster. Code is running fine at local.

[Exception in thread "main" java.lang.NoSuchMethodError: org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;)V]

`object mytestmain {

def main(args: Array[String]): Unit = {
println("In main function")
println("reading from gcs bucket")

//val storage = StorageOptions.getDefaultInstance.getService
//val my_blob = storage.get(BlobId.of("test-bucket", "job-configs/test.yml"))

// val filecontent = new String(my_blob.getContent(), StandardCharsets.UTF_8)

val config = """file_location: test-file
               |big_query_dataset: test-dataset
               |big_query_tablename: test-table
                 """.stripMargin

val classobj = new IngestionData()
classobj.printYamlfiledata(config)

}
}`

`package com.test.processing.jobs

import net.jcazevedo.moultingyaml._
import com.test.processing.conf.DatasetConfiguration

object ReadYamlConfiguration extends DefaultYamlProtocol {
implicit object datasetConfFormat extends YamlFormat[DatasetConfiguration] {

def write(obj: DatasetConfiguration)=YamlObject (
  YamlString("file_location") -> YamlString(obj.file_location),
  YamlString("big_query_dataset") -> YamlString(obj.big_query_dataset),
  YamlString("big_query_tablename") -> YamlString(obj.big_query_tablename)
)

println("I am in read datasetConfFormat object ")
def read(value: YamlValue) = {
  value.asYamlObject.getFields(
    YamlString("file_location"),
    YamlString("big_query_dataset"),
    YamlString("big_query_tablename")) match {
    case Seq(
    YamlString(file_location),
    YamlString(big_query_dataset),
    YamlString(big_query_tablename)) =>
    new DatasetConfiguration(file_location, big_query_dataset, big_query_tablename)
    case _ => deserializationError("Data configs expected")
  }
}
implicit val YamlDatasetConfigurationfFormat = yamlFormat3(DatasetConfiguration)

}
}`

`import net.jcazevedo.moultingyaml._

import com.test.processing.jobs.ReadYamlConfiguration._

class IngestionData {

def printYamlfiledata(filedata: String) = {

println("I am in readYamlfiledata method")

val myObj = filedata.parseYaml.convertTo[DatasetConfiguration]
println("file name is :" + myObj.file_location)
println("dataset name is:" +myObj.big_query_dataset)
println("Table name is:" + myObj.big_query_tablename)

}

}`

case class DatasetConfiguration ( file_location: String, big_query_dataset: String, big_query_tablename: String )

It's failing when I am reading yaml file from bucket or even when I have hardcoded the file as an Input. running fine at local

@VIKCT001 I am facing the exact same error, I have a project with Scala 2.11 and Spark 2.4, after implementing YAML parsing everything worked flawlessly via sbt test but after building a fat jar with sbt assembly and running spark-submit locally I am getting the method not found exception. After unpacking the jar, I can confirm that the org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;) is there.

My hunch is something bad is happening in assemblyMergeStrategy as it excludes pom.properties and and pom.xml of SnakeYAML because it is under META-INF folder which is expected behavior. @VIKCT001 could you share your build.sbt?

@jcazevedo Do you have any hunches? Have you encountered such a scenario?

The reason for this issue to occur is because Apache Spark (2.4 in my case) uses SnakeYAML 1.15 which gets picked up first by the class loader when running the project with spark-submit and SnakeYAML 1.26 used by moultingyaml gets ignored.

Solution is to shade SnakeYAML in your build.sbt like that:

assemblyShadeRules in assembly := Seq(
  // fixes the problem when running from spark-submit an older version of SnakeYAML is being used
  ShadeRule.rename("org.yaml.snakeyaml.**" -> "org.yaml.snakeyamlShaded@1").inAll
)