sryza / aas

Code to accompany Advanced Analytics with Spark from O'Reilly Media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ch06-lsa] spark.sparkContext.newAPIHadoopFile error type mistach

jhhan45 opened this issue · comments

Thanks to the kind replies, I can solve the external jar file import problem in Toree notebook.
Unfortunately now I encounter another problem.
When I try to load the wikidump xml file, I got this error message

val path = "Wikipedia-Megafauna.xml"
@transient val conf = new Configuration()
conf.set(XMLInputFormat.START_TAG_KEY, "<page>")
conf.set(XMLInputFormat.END_TAG_KEY, "</page>")
val kvs = spark.sparkContext.newAPIHadoopFile(path, classOf[XMLInputFormat],
classOf[LongWritable], classOf[Text], conf)
val rawXmls = kvs.map(_._2.toString).toDS()
Name: Unknown Error
Message: <console>:58: error: type mismatch;
 found   : org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
 required: org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
       classOf[LongWritable], classOf[Text], conf)
                                             ^
StackTrace: 

I don't understand the context of this error, because required form and found form are the same.
Wikipedia-Megafauna.xml file exists in the same directory.

Thank you!

I found the solution.
I missed import spark.implicits._
I close this issue.
Thanks~!