Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RDD support

Jolanrensen opened this issue · comments

For now we are limited to use JavaRDD and using JavaSparkContext to have easy-to-work-with RDDs in Kotlin.
My suggestion is to provide a Kotlin wrapper (or set of extensions) for the Scala RDD class to be able to work with those as well. Converting between RDDs and Datasets can also be made accessible using this.

@khud could you please look into it? I remember that you've had some support for RDDs, is it possible to incorporate your work inside our current solution?

A couple of years ago I tried to do so. It's not so easy, because it needs a lot of boilerplate code. I believe JavaRDD works well with Kotlin:

val rdd = sc.parallelize(listOf(1,2,3))
rdd.map { it * it }.reduce { x, y -> x + y }

I'm sure that some problems exist but it would be great to know what particular kind of improvement makes you happy.

@Jolanrensen do you have any further questions about RDD support?

@asm0dey sorry for the late response. No, it's clear. I do think it might be a bit confusing for users to use Java RDD and Java SparkContext in Kotlin, but @khud is right that it would need a large amount of boilerplate code. It would be ideal if Apache would encompass Kotlin into the Spark API like they did with Java, but for now, using the Java classes together with this API is the best we've got.

@Jolanrensen there is an issue about it in Apache Spark's Jira: https://issues.apache.org/jira/browse/SPARK-32530
You can vote and even comment there. We're interested to be included in upstream too.

Closing it for now as won't fix due to lack of demand. We'll reopen it if the demand will grow.