关于partitioner的疑问

Question

关于partitioner的疑问

leo-987 opened this issue 8 years ago · comments

我在 Learning Spark 中看到有一段话：

Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.

子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章，父子RDD的partitioner都不相同，这是怎么回事？如果两个父RDD的其中一个是hash-partitioner，那么子RDD不应该也是hash-partitioner吗？