[ch06-lsa] spark.sparkContext.newAPIHadoopFile error type mistach

Question

[ch06-lsa] spark.sparkContext.newAPIHadoopFile error type mistach

jhhan45 opened this issue 7 years ago · comments

Thanks to the kind replies, I can solve the external jar file import problem in Toree notebook.
Unfortunately now I encounter another problem.
When I try to load the wikidump xml file, I got this error message

val path = "Wikipedia-Megafauna.xml"
@transient val conf = new Configuration()
conf.set(XMLInputFormat.START_TAG_KEY, "<page>")
conf.set(XMLInputFormat.END_TAG_KEY, "</page>")
val kvs = spark.sparkContext.newAPIHadoopFile(path, classOf[XMLInputFormat],
classOf[LongWritable], classOf[Text], conf)
val rawXmls = kvs.map(_._2.toString).toDS()

Name: Unknown Error
Message: <console>:58: error: type mismatch;
 found   : org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
 required: org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
       classOf[LongWritable], classOf[Text], conf)
                                             ^
StackTrace:

I don't understand the context of this error, because required form and found form are the same.
Wikipedia-Megafauna.xml file exists in the same directory.

Thank you!

jhhan45 · Answer 1 · Thu Oct 12 2017 15:25:03 GMT+0800 (China Standard Time)

I found the solution.
I missed import spark.implicits._
I close this issue.
Thanks~!