Repeated results (in a loop)

Question

Repeated results (in a loop)

krzyzanowskim opened this issue 7 months ago · comments

Marcin Krzyzanowski commented 7 months ago

I noticed that for the same model and prompt, I get different output.

The issues I noticed

Swift version start to repeat itself to fill the max number of tokens (it's hard to guess the minimal number of tokens needed to get the full reply)
Python output skips the first line, compared to the Swift output that outputs full (which seems more appropriate for the Instruct model and FILL_ME token use)

Model: mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX
Prompt: func sortArray(_ array: [Int]) -> String { <FILL_ME> }

python -m mlx_lm.generate --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }"

Fetching 6 files: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 56552.41it/s]
==========
Prompt: func sortArray(_ array: [Int]) -> String { <FILL_ME> }

    let sorted = array.sorted()
    return sorted.map { String($0) }.joined(separator: ",")
}

sortArray([3, 2, 1]) // "1,2,3"
sortArray([1, 1, 1]) // "1,1,1"
sortArray([]) // ""
sortArray([-1, 1]) // "-1,1"
sortArray
==========
Prompt: 16.219 tokens-per-sec
Generation: 4.817 tokens-per-sec

mll-tool --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }" --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX

Starting generation ...
func sortArray(_ array: [Int]) -> String { <FILL_ME> }

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map { String($0) }.joined(separator: ",")
}

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map { String($0) }.joined(separator: ",")
}

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map
------
Prompt Tokens per second:     17,566774
Generation tokens per second: 4,815439
Program ended with exit code: 0

David Koski · Answer 1 · Tue Feb 27 2024 05:10:32 GMT+0800 (China Standard Time)

It looks like most of this is down to three factors:

different seeds
the tokenizer is completely different
the eos_token is not exposed

I noticed that the python code can get into a repetitive loop as well, but sometimes it will break out when it sees the eos_token. Generally this wasn't present and it will keep going and going!

The tokenizer is probably the biggest problem. On the python side it is CodeLlamaTokenizerFast and returns:

[1, 32007, 3653, 2656, 2588, 7373, 1409, 29901, 518, 2928, 2314, 1599, 1714, 426, 29871, 32008, 500, 32009]

Which is this (run back through the tokenizer):

<s> <PRE> func sortArray(_ array: [Int]) -> String {  <SUF> } <MID>

for the prompt, while the generic one on the swift side returns:

[1, 3653, 2656, 2588, 7373, 1409, 29901, 518, 2928, 2314, 1599, 1714, 426, 529, 3738, 2208, 29918, 2303, 29958, 500]

It looks like the BPETokenizer (from swift-tokenizers) doesn't have any of the special handling.

For reference: https://github.com/huggingface/transformers/blob/main/src/transformers/models/codegen/tokenization_codegen_fast.py

Marcin Krzyzanowski · Answer 2 · Tue Feb 27 2024 05:31:56 GMT+0800 (China Standard Time)

thank you for checking that for me!

the eos_token is not exposed

that missing eos_token seems to be the culprit then. <s> <PRE> func sortArray(_ array: [Int]) -> String { <SUF> } <MID> gives the same result (for that simple prompt) so that's less of an issue I guess.

David Koski · Answer 3 · Tue Feb 27 2024 07:00:29 GMT+0800 (China Standard Time)

I had to wrap the tokenizer but it now has an eosTokenId -- see if that works better for you. 3f02fcc

OK to close?

Marcin Krzyzanowski · Answer 4 · Tue Feb 27 2024 18:46:35 GMT+0800 (China Standard Time)

thank you. for that very prompt and model I don't see a difference, I can't judge if it's mode fault or something is still remaining:

mll-tool --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }" --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --max-tokens 500

Starting generation ...
<s> <PRE> func sortArray(_ array: [Int]) -> String {  <SUF> } <MID>let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0)
------
Prompt Tokens per second:     18,183551
Generation tokens per second: 4,402603
Program ended with exit code: 0

David Koski · Answer 5 · Tue Feb 27 2024 23:56:42 GMT+0800 (China Standard Time)

It seems hit or miss if it actually generates the <eos> or not. On the python side I found that varying the seed or the prompt was enough to change the behavior wildly (e.g. it generating repeating output over and over). It may be the same thing we are seeing here.

For example this one on the python side repeats over and over:

python -m mlx_lm.generate --model ~/Documents/huggingface/models/mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --prompt 'func sortArray(_ array: [Int]) -> String { <FILL_ME> }' --seed 7 -m 1000

...
// MARK: -

// MARK: -

// MARK: -
...

Seed 2 generates an EOS on the python side. It generates similar text on the swift side and hits the EOS if we use the same token array as python (from above).

Marcin Krzyzanowski · Answer 6 · Wed Feb 28 2024 02:27:19 GMT+0800 (China Standard Time)

os, so it seems you already did what’s possible. thank you