Use a second stream for de-allocating memory

Question

Use a second stream for de-allocating memory

coreylowman opened this issue 2 years ago · comments

Currently all operations, including de-allocations, happen on the default stream. In dfdx, after a long forward pass with many operations (e.g. 100 operations, each producting 1+ gradient), all gradients are captured in a Gradients object. After the forward pass is done, the gradients object is dropped, which means ALL temporary gradients are de-allocated at once.

This blocks the default stream at the moment, so all de-allocations occur before any other work can complete.

Instead, we should put de-allocations on a second stream that is synchronized with the default stream with events:

call cuEventCreate
call cuEventRecord with the default stream
call cuStreamWaitEvent with the event and the deallocation stream
call free_async with the deallocation stream

This should free up the default stream to continue working

Corey Lowman · Answer 1 · Fri Feb 24 2023 08:06:47 GMT+0800 (China Standard Time)

Questions:

Do we create a new event for each new de-allocation that happens? Or is it possible to have 1 event that the device holds that we use to synchronize?
How do we free events with cuEventDestroy? And how does cuEventDestroy interact with cuStreamWaitEvent?

Cuda docs state about cuEventDestroy:

An event may be destroyed before it is complete (i.e., while cuEventQuery() would return CUDA_ERROR_NOT_READY). In this case, the call does not block on completion of the event, and any associated resources will automatically be released asynchronously at completion.

Does this mean we can just call cuEventDestroy right after we create, and the stream will still synchronize?

Corey Lowman · Answer 2 · Sat Feb 25 2023 04:52:09 GMT+0800 (China Standard Time)

Okay from cuda docs:

Other APIs such as cuStreamWaitEvent() use the most recently captured state at the time of the API call, and are not affected by later calls to cuEventRecord().

This implies to me we can allocate a single event to use!