框架执行空间不足时,可以向数据缓存空间借用空间,但至少要保证数据缓存空间具有约50%左右的空间?
ycli12 opened this issue · comments
P236中间有这么一句话:"框架执行空间不足时,可以向数据缓存空间借用空间,但至少要保证数据缓存空间具有约50%左右的空间?",这句话看起来好像框架执行空间最多只能占用50%的空间,但实际上在缓存空间用了少于50%,比如20%的时候,框架执行内存还是可以使用80%的吧?我理解是只有数据缓存空间真的用了50%的时候,框架执行空间才只能占用50%。
@ycli12 感谢指出,你的理解是对的,补充一句“在缓存空间用了少于50%,比如20%的时候,框架执行内存还是可以使用80%。但是此时如果需要缓存更多数据,缓存空间由于被框架执行内存占用而不足,只能丢弃需要缓存的数据。“
在后面还有一句话,“在框架执行时借走的空间不会归还给数据缓存空间,原因是难以代码实现。”,我看到了网上的一段Spark官方的会议视频的11:21处,演讲者说到还有一个原因是,框架执行内存spill到磁盘的内存一定会被重新加载回内存(在进行最后的聚合/排序的时候),而缓存不一定会被用到,因此框架执行空间不归还内存是因为框架执行内存用法在一般情况下更“划算”,书籍后面还可以补充多这一点的原因。
我最近看了源码,发现执行空间是任意向缓存空间借内存的,并不存在50%的限制,如果我理解错了,麻烦说下。
maybeGrowExecutionPool
/**
* Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
*
* When acquiring memory for a task, the execution pool may need to make multiple
* attempts. Each attempt must be able to evict storage in case another task jumps in
* and caches a large block between the attempts. This is called once per attempt.
*/
// 尽量挤占Storage内存到Execution
def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
if (extraMemoryNeeded > 0) {
// There is not enough free memory in the execution pool, so try to reclaim memory from
// storage. We can reclaim any free memory from the storage pool. If the storage pool
// has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
// the memory that storage has borrowed from execution.
// storagePool.poolSize - storageRegionSize 包含了Storage的内存和挤占Execution的内存
// 不挤占的情况下 storagePool.memoryFree = storagePool.poolSize - storageRegionSize
val memoryReclaimableFromStorage = math.max(
storagePool.memoryFree,
storagePool.poolSize - storageRegionSize)
if (memoryReclaimableFromStorage > 0) {
// Only reclaim as much space as is necessary and available:
// 有多少释放多少,少点也行
val spaceToReclaim = storagePool.freeSpaceToShrinkPool(
math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
// 从storage移到execution了
storagePool.decrementPoolSize(spaceToReclaim)
executionPool.incrementPoolSize(spaceToReclaim)
}
}
}