How does Salus use an exsiting lane?
jasperzhong opened this issue · comments
Given that a lane has already been assigned to a DL job, it seems that other DL jobs cannot share this lane according to the following code:
std::unique_ptr<LaneHolder> GpuLane::tryFit(size_t persistent, size_t peak)
{
auto g = sstl::with_guard(m_mu);
auto maxPeak = peak;
if (!m_maxPeak.empty()) {
maxPeak = std::max(maxPeak, *m_maxPeak.cbegin());
}
if ((persistent + maxPeak) <= m_availableMemory) { // 1
addHoldUnsafe(persistent, peak);
return std::make_unique<LaneHolder>(sstl::add_ref(this), persistent, peak);
}
return {};
}
Actually maxPeak
will equal to m_availableMemory
at 1 when the second DL job that wants to share this lane invokes this function. persistent
should be greater thant 0. Then it will skip if-block.
hmm, you are right. the logic seems to be incorrect there. The idea is that a lane keeps its size when first created, which equals to persistent + peak of the first workload in the lane. New workload will fit in if its persistent and peak is smaller than available.
I see. So just remove the following code?
if (!m_maxPeak.empty()) {
maxPeak = std::max(maxPeak, *m_maxPeak.cbegin());
}
I'm away from pc so I can't judge at the moment. You should be careful removing it though. I probably added it for a reason. The lane should always maintain its largest peak, given workloads can come and go. I think m_maxPeak is related to this logic. So check that before you proceed
ok. Another question is whether a lane can change its size on the fly?
no it can't