ml-explore / mlx-swift-examples

Examples using MLX Swift

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Repeated results (in a loop)

krzyzanowskim opened this issue · comments

I noticed that for the same model and prompt, I get different output.

The issues I noticed

  1. Swift version start to repeat itself to fill the max number of tokens (it's hard to guess the minimal number of tokens needed to get the full reply)
  2. Python output skips the first line, compared to the Swift output that outputs full (which seems more appropriate for the Instruct model and FILL_ME token use)

Model: mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX
Prompt: func sortArray(_ array: [Int]) -> String { <FILL_ME> }

python -m mlx_lm.generate --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }"
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 56552.41it/s]
==========
Prompt: func sortArray(_ array: [Int]) -> String { <FILL_ME> }

    let sorted = array.sorted()
    return sorted.map { String($0) }.joined(separator: ",")
}

sortArray([3, 2, 1]) // "1,2,3"
sortArray([1, 1, 1]) // "1,1,1"
sortArray([]) // ""
sortArray([-1, 1]) // "-1,1"
sortArray
==========
Prompt: 16.219 tokens-per-sec
Generation: 4.817 tokens-per-sec
mll-tool --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }" --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX
Starting generation ...
func sortArray(_ array: [Int]) -> String { <FILL_ME> }

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map { String($0) }.joined(separator: ",")
}

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map { String($0) }.joined(separator: ",")
}

func sortArray(_ array: [Int]) -> String {
    return array.sorted().map
------
Prompt Tokens per second:     17,566774
Generation tokens per second: 4,815439
Program ended with exit code: 0

It looks like most of this is down to three factors:

  • different seeds
  • the tokenizer is completely different
  • the eos_token is not exposed

I noticed that the python code can get into a repetitive loop as well, but sometimes it will break out when it sees the eos_token. Generally this wasn't present and it will keep going and going!

The tokenizer is probably the biggest problem. On the python side it is CodeLlamaTokenizerFast and returns:

[1, 32007, 3653, 2656, 2588, 7373, 1409, 29901, 518, 2928, 2314, 1599, 1714, 426, 29871, 32008, 500, 32009]

Which is this (run back through the tokenizer):

<s> <PRE> func sortArray(_ array: [Int]) -> String {  <SUF> } <MID>

for the prompt, while the generic one on the swift side returns:

[1, 3653, 2656, 2588, 7373, 1409, 29901, 518, 2928, 2314, 1599, 1714, 426, 529, 3738, 2208, 29918, 2303, 29958, 500]

It looks like the BPETokenizer (from swift-tokenizers) doesn't have any of the special handling.

For reference: https://github.com/huggingface/transformers/blob/main/src/transformers/models/codegen/tokenization_codegen_fast.py

thank you for checking that for me!

the eos_token is not exposed

that missing eos_token seems to be the culprit then. <s> <PRE> func sortArray(_ array: [Int]) -> String { <SUF> } <MID> gives the same result (for that simple prompt) so that's less of an issue I guess.

I had to wrap the tokenizer but it now has an eosTokenId -- see if that works better for you. 3f02fcc

OK to close?

thank you. for that very prompt and model I don't see a difference, I can't judge if it's mode fault or something is still remaining:

mll-tool --prompt "func sortArray(_ array: [Int]) -> String { <FILL_ME> }" --model mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --max-tokens 500
Starting generation ...
<s> <PRE> func sortArray(_ array: [Int]) -> String {  <SUF> } <MID>let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0) }.joined(separator: " ")
}

func sortArray(_ array: [Int]) -> String {
    let sortedArray = array.sorted()
    return sortedArray.map { String($0)
------
Prompt Tokens per second:     18,183551
Generation tokens per second: 4,402603
Program ended with exit code: 0

It seems hit or miss if it actually generates the <eos> or not. On the python side I found that varying the seed or the prompt was enough to change the behavior wildly (e.g. it generating repeating output over and over). It may be the same thing we are seeing here.

For example this one on the python side repeats over and over:

python -m mlx_lm.generate --model ~/Documents/huggingface/models/mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX --prompt 'func sortArray(_ array: [Int]) -> String { <FILL_ME> }' --seed 7 -m 1000

...
// MARK: -

// MARK: -

// MARK: -
...

Seed 2 generates an EOS on the python side. It generates similar text on the swift side and hits the EOS if we use the same token array as python (from above).

os, so it seems you already did what’s possible. thank you