Troyanovsky / Local-LLM-Comparison-Colab-UI

Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

make coding scores based on unit tests

rmminusrslash opened this issue · comments

Hey,

great initiative to track local llms!

Would you be open to talking about how the scores are created?

  • I created some gpt-4 scores in a project in the past and found them not good enough (they would fluctuate based on input sentences with the same meanings, scores somewhat too arbitrary, different days would give different scores for the same input). At least you should pin the gpt-4 version so you have better control when they roll updates to gpt-4

  • For code one could add unit tests to check the created functions