nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

restructure the multi-head attention layer

jijoongmoon opened this issue · comments

We can optimize the memory consumption of the multi-head attention layer by combination of layers. By doing this, we could reduce the memory further.

  1. compute multi-head one by one.
  2. re-implement the multi-head attention layer as an backbone layer.

:octocat: cibot: Thank you for posting issue #1998. The person in charge will reply soon.

To list for 1

  • Enhance split layer to split input by given number(number of head). #2025
  • Replace a multi head attention layer by make a sub-graph
  • Compare the peak memory consumption and latency before and after changes
  • Compare the peak memory consumption and latency before and after enabling the swap feature