The Problem with Padding

When training these models, it's common to batch multiple sequences together for efficiency. However, sequences in a batch often have varying lengths. To ensure every sequence in the batch has the same length, shorter sequences are "padded" with extra tokens (often zeros) to match the length of the longest sequence.

While this method works, it's not optimal. Here's why:

Computational Overhead: The model has to process these padding tokens, which don't contribute any meaningful information. This increases computational costs without direct benefit.
Wasted Token Capacity: Especially in situations where many sequences are much shorter than the model's maximum context size, a lot of the model's processing capacity is wasted on padding tokens.

Packing to the Rescue

The "packing" technique aims to address these inefficiencies. Here's how it works:

Concatenation: Multiple text sequences are concatenated end-to-end, with each sequence separated by a special token, typically the "End Of Sentence" (EOS) token. This creates a long, continuous sequence of meaningful tokens without gaps or padding.
Dividing into Chunks: This long sequence is then divided into fixed-length chunks that match the model's context size. For instance, if the model's context size is 2048 tokens, then the concatenated sequence is split every 2048 tokens.
No Padding Needed: Since each chunk is exactly the size of the model's context, there's no need for padding. Every token is meaningful and contributes to training.

Benefits

Efficiency: The model processes fewer redundant tokens, making training faster and more computationally efficient.
Better Utilization: The model's token processing capacity is better utilized since it's focusing only on meaningful tokens and not wasting computations on padding.
Consistent Sequence Length: Each sequence fed to the model during training is of consistent length, simplifying certain aspects of the training pipeline.

DrishtiShrrrma / llama2-7b-rlhf

The Problem with Padding

Packing to the Rescue

Benefits

About