ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
📖 Paper • 🚀 Github Page • 📊 Data
GPT Calling
You can use the following script to reproduce GPT's performance on this task:
sh script/GPT/run.sh
You need to change parameter settings in script/GPT/run.sh
:
-
type: Choose from quarter or full.
-
model: Model name
-
input_file: File path of dataset
-
answer_file: Original answer json format from GPT.
-
parsing_file: Post-process the output of GPT in jsonl format to obtain executable code segments.
-
readme_type: Choose from oracle_segment and readme
# oracle_segment: The code paragraph in the readme that is most relevant to the task
# readme: The entire text of the readme in the repository where the task is located
-
engine_name: Choose from gpt-35-turbo-16k and gpt-4-32.
-
n_turn: GPT returns the number of executable codes (5 times in the paper experiment).
-
openai_key: Your key.
CodeLlama-7b Fine-tuning
Please refer to CodeLlama-7b for details.
Tools
Get BM25 result
Run python script/tools/bm25.py
to generate BM25 results for the instructions and readme. Ensure to update the original dataset path
and output path
which includes the BM25 results.
Crawl README files from github repository
Run python script/tools/crawl.py
to fetch readme files from a specific GitHub repository. You'll need to modify the url
within the code to retrieve the desired readme files.
Cite Us
This project is inspired by some related projects. We would like to thank the authors for their contributions. If you find this project or dataset useful, please cite it:
@article{liu2023mlbench,
title={ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks},
author={Yuliang Liu and Xiangru Tang and Zefan Cai and Junjie Lu and Yichi Zhang and Yanjun Shao and Zexuan Deng and Helan Hu and Zengxian Yang and Kaikai An and Ruijun Huang and Shuzheng Si and Sheng Chen and Haozhe Zhao and Zhengliang Li and Liang Chen and Yiming Zong and Yan Wang and Tianyu Liu and Zhiwei Jiang and Baobao Chang and Yujia Qin and Wangchunshu Zhou and Yilun Zhao and Arman Cohan and Mark Gerstein},
year={2023},
journal={arXiv preprint arXiv:2311.09835},
}