kaiokendev / cutoff-len-is-context-len

Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit