[ch06-lsa] spark.sparkContext.newAPIHadoopFile error type mistach
jhhan45 opened this issue · comments
jhhan45 commented
Thanks to the kind replies, I can solve the external jar file import problem in Toree notebook.
Unfortunately now I encounter another problem.
When I try to load the wikidump xml file, I got this error message
val path = "Wikipedia-Megafauna.xml"
@transient val conf = new Configuration()
conf.set(XMLInputFormat.START_TAG_KEY, "<page>")
conf.set(XMLInputFormat.END_TAG_KEY, "</page>")
val kvs = spark.sparkContext.newAPIHadoopFile(path, classOf[XMLInputFormat],
classOf[LongWritable], classOf[Text], conf)
val rawXmls = kvs.map(_._2.toString).toDS()
Name: Unknown Error
Message: <console>:58: error: type mismatch;
found : org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
required: org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.org.apache.hadoop.conf.Configuration
classOf[LongWritable], classOf[Text], conf)
^
StackTrace:
I don't understand the context of this error, because required form and found form are the same.
Wikipedia-Megafauna.xml file exists in the same directory.
Thank you!
jhhan45 commented
I found the solution.
I missed import spark.implicits._
I close this issue.
Thanks~!