Why the definition of dependencies is different from RDD paper?
endersuu opened this issue · comments
From the paper Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- narrow dependencies, where each partition of the parent RDD is used by at most one partition of the child RDD
- wide dependencies, where multiple child partitions may depend on it
However, The definition of dependencies from the chapter JobLogicalPlan is different :
-
NarrowDependency, Each partition of the child RDD fully depends on a small number of partitions of its parent RDD. Fully depends (i.e., FullDependency) means that a child partition depends the entire parent partition.
-
ShuffleDependency, Multiple child partitions partially depends on a parent partition. Partially depends (i.e., PartialDependency) means that each child partition depends a part of the parent partition.
This makes me really confused. Are ShuffleDependency and wide dependency the same thing?