Coding / HumanEval

Question

Coding / HumanEval

excellent-ai opened this issue 8 months ago · comments

Hello,

I am reaching out to discuss the BlueLM-7B-Chat model's current HumanEval score of 21.3, which, as you may agree, is not particularly high for coding tasks. There are other 7B models that can achieve a score of >45 on it. Are there any plans to enhance its code synthesis capabilities with a fine-tuned version in the near future? Thank you!

JoeyHeisenberg · Answer 1 · Thu Nov 09 2023 15:52:18 GMT+0800 (China Standard Time)

Thank you for your interest in our BlueLM-7B-Chat model. We appreciate your feedback regarding its current performance on HumanEval and agree that there is room for improvement.

We do have plans to enhance its code synthesis capabilities. We are constantly working to improve the performance of our models, and hoping the next updates will meet your expectations.

Could you please provide more information about the specific use case you have in mind? This information would be invaluable in helping us tailor our upcoming updates to best meet our users' needs.

excellent-ai · Answer 2 · Fri Nov 10 2023 02:47:27 GMT+0800 (China Standard Time)

Thank you for the response. We're focused on Python coding assistance with HumanEval.

Will you (1) update the base model or (2) release a fine-tuned model?

What's the timeline for this?

JoeyHeisenberg · Answer 3 · Fri Nov 10 2023 14:34:12 GMT+0800 (China Standard Time)

We do plan to release new base and fine-tuned models in the future, but at the moment, we don't have a specific timeline for this. Please stay tuned for updates and announcements on our GitHub page