transformer encoder nlp ai ml machine learning machine-learning artificial-intelligence multihead-attention sequence

Scaling Transformer Encoder

Implementation (kind of) of a Transformer Encoder. Able to down-scale dmodel to make the dimensions smaller for later in a model. Pretty simple.

Example

import torch
from scale_transformer_encoder import ScalingLayer
x = torch.randn(16, 40, 256)
scale = ScalingLayer(in_features=256,
                     out_features=512,
                     pwff_inner_features=1028,
                     heads=8,
                     multihead_scale=False,
                     head_scale=False,
                     return_attn=True)
out, attn = scale(x)
print("Input size: {}".format(x.size()))
print("Output size: {}".format(out.size()))
print("Attention size: {}".format(attn.size()))

Output

Input size: torch.Size([16, 40, 256])
Output size: torch.Size([16, 40, 512])
Attention size: torch.Size([16, 8, 40, 40])

About

A Transformer Encoder where the embedding size can be down-sized.

transformer encoder nlp ai ml machine learning machine-learning artificial-intelligence multihead-attention sequence

Languages

Language:Python 51.9%Language:Jupyter Notebook 48.1%