JerryLead / SparkInternals

Notes talking about the design and implementation of Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should mention `RangePartitioner` in explaining `sortByKey`

darkjh opened this issue · comments

在解释sortByKey的时候:

https://github.com/JerryLead/SparkInternals/blob/master/markdown/2-JobLogicalPlan.md

sortByKey实际使用的是RangePartitioner。分片排序之后,我们还需要分片之间的顺序关系才能最终输出排序后的结果。

代码:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala#L62

谢谢指出,最近在忙paper,我会在下次review的时候修改的