Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why cannot the generate function be used twice

WyGongya opened this issue · comments

When i invoke the generate function twice with different idx,it appears the fault described as "RuntimeError: The expanded size of the tensor (181) must match the existing size (168) at non-singleton dimension 3. Target sizes: [1, 32, 68, 181]. Tensor sizes: [1, 1, 68, 168]"

should i reset the model' arguements or reload the model?