Create DataSet using SparkSession

Question

Create DataSet using SparkSession

santhoshtangudu opened this issue 7 years ago · comments

Hi,
We have 4mc format files in my Hadoop cluster. We are trying to read these files and create DataSet (instead of creating RDD and then DataSet) in spark-2.0. Can you please us to do the same?

Carlo Medas · Answer 1 · Tue Jun 27 2017 02:11:38 GMT+0800 (China Standard Time)

For sure there are several methods to achieve that, and to be honest I'm not sure I'm giving you best solution. Unfortunately at this time I don't have time to dig deeper, but:
what about, you load the RDD using a SQLContext, which could be pre-filtered etc etc, then you create a bean class that can be used to quickly map to Dataset by leveraging convention over configuration, like e.g:
Dataset devices = sqlContext.createDataFrame(ratingsRDD, DeviceEntry.class);