huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift

Repository from Github https://github.comhuggingface/swift-transformersRepository from Github https://github.comhuggingface/swift-transformers

Is GenerationConfig.repetitionPenalty used during generation?

joneavila opened this issue · comments

I am testing the code using the Core ML version of Llama 2.

Setting GenerationConfig.maxLength to something larger than the default, e.g., 64, produces the correct number of output tokens, but tends to repeat tokens towards the end of generation. Adjusting repetitionPenalty doesn't seem to have an effect.

Looking into Generation.swift, I see the code references maxLength, eosTokenId, temperature and others, but not repetitionPenalty. Does this explain the repetitive output?