初级问题，就是文章里是有很多RDD的依赖关系图，但是我找不到这些plan怎么在worker执行的相关代码？多谢！

Question

guotong1988 opened this issue 9 years ago · comments

比如这个testcase

test("sortByKey") {

val pairs = sc.parallelize(Array((1, 0), (2, 0), (0, 0), (3, 0)), 2)

assert(pairs.sortByKey().collect() === Array((0, 0), (1, 0), (2, 0), (3, 0)))

}

collect()之前都是一个一个new新的RDD，好像没有实际计算，之后runJob里面又debug不到，

全部的归并结果的代码应该在RDD.scala的Array.concat(results: _*)

def collect(): Array[T] = withScope {

val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)

Array.concat(results: _*)

}

您的shuffle那章有sortByKey的转化图，可是这些计算过程在哪呢？

Lijie Xu · Answer 1 · Wed Aug 19 2015 20:55:40 GMT+0800 (China Standard Time)

计算逻辑在RDD.compute()，计算过程是pipeline的，你可以通过finalRDD.debug()看到RDD的依赖图，建议你仔细看下LogicalPlan和PhysicalPlan那两章就明白了