JerryLead / SparkInternals

Notes talking about the design and implementation of Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

关于partitioner的疑问

leo-987 opened this issue · comments

commented

我在 Learning Spark 中看到有一段话:

Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.

子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章,父子RDD的partitioner都不相同,这是怎么回事?如果两个父RDD的其中一个是hash-partitioner,那么子RDD不应该也是hash-partitioner吗?