ohmydroid / ShareNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ShareNet

Two convolutional layer in the bottleneck block share parameters, which can be formulated as:
$$Y=W^{T}A(WX)$$

Such pattern can be extended into FFN or MOE FFN of large language models to reduce parameters and accelerate model traning and inference.

About

License:MIT License


Languages

Language:Python 100.0%