ml-explore / mlx-swift-examples

Examples using MLX Swift

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

llm-eval: not responding to 'what is your name?' or 'what is the difference between star wars and star trek?'

CharlieTLe opened this issue · comments

On my Mac, I see the error

CLIENT ERROR: TUINSRemoteViewController does not override -viewServiceDidTerminateWithError: and thus cannot react to catastrophic errors beyond logging them

It does respond to compare python and swift fine though

That actually looks "right":

python -m mlx_lm.generate --model ~/Documents/huggingface/models/mlx-community/phi-2-hf-4bit-mlx --prompt 'Instruct: what is your name?. Output: '
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
==========
Prompt: Instruct: what is your name?. Output: 


==========
Prompt: 32.359 tokens-per-sec
Generation: 0.000 tokens-per-sec

The problem seems to be in the prompt template:

        "Instruct: \(prompt). Output: "

it should be:

        "Instruct: \(prompt)\nOutput: "

that gives a much better response, though in (perhaps) Chinese?

Nothing from 'what is the difference between star wars and star trek?' but the python version doesn't either.

It looks like phi2 can't answer that prompt -- maybe it doesn't cover that info or maybe it is tool small? mistral7B4bit aka mlx-community/Mistral-7B-v0.1-hf-4bit-mlx seems to do an ok job, though sometimes a bit silly.

Three changes were made and I think this fixes or greatly improves the response here:

  • the prompt for Phi was adjusted to fit the format better -- it is sensitive to the exact wording
  • the temperature was set up to 0.6 to match the python code
  • a new random seed is generated for each time you generate -- you can explore a little

You may need to switch to a larger model like Mistral 7B to see more interesting responses for a wider range of inputs.