Is GenerationConfig.repetitionPenalty used during generation?
joneavila opened this issue · comments
I am testing the code using the Core ML version of Llama 2.
Setting GenerationConfig.maxLength
to something larger than the default, e.g., 64
, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. Adjusting repetitionPenalty
doesn't seem to have an effect.
Looking into Generation.swift
, I see the code references maxLength
, eosTokenId
, temperature
and others, but not repetitionPenalty
. Does this explain the repetitive output?