MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

This repo provides the source code of our paper: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents. [PDF][Twitter][Demo] If you discuss or use MLR-Copilot in your research, please cite us!

@misc{li2024mlrcopilotautonomousmachinelearning,
      title={MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents}, 
      author={Ruochen Li and Teerth Patel and Qingyun Wang and Xinya Du},
      year={2024},
      eprint={2408.14033},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2408.14033}, 
}

MLR-Copilot is a framework where LLMs mimic researchers’ thought processes, designed to enhance the productivity of machine learning research by automating the generation and implementation of research ideas.

It begins with a research paper, autonomously generating and validating these ideas, while incorporating human feedback to help reach executable research outcomes.

Demo (Link)

demo_rec_compress.mov

Framework Overview

MLR-Copilot operates in three integrated phases:

Research Idea Generation: LLM-powered agents generate research hypotheses and experimental plans based on existing research papers.
Experiment Implementation: Translates experimental plans into executable experiments using retrieved prototype code and models.
Implementation Execution: Runs the experiments with mechanisms for human feedback and iterative debugging. Figure 1: The autonomous machine learning research task. We take the research paper as input and output the research idea (i.e., research hypothesis and experiment plan) with execution results.

Figure 2: Our MLR-Copilot Framework. LLM IdeaAgent (leftmost grey component) performs research idea generation, including hypothesis and experimental design (Stage 1). ExperimentAgent implements and executes the experiments.

Quick Start

Open in Colab

Setup

Begin by cloning this repository.

LLM Configuration

Place the following in a .env file at the root of this project:
- CLAUDE_API_KEY
- OPENAI_API_KEY
Configure the Hugging Face Token as needed so that huggingface_hub.login() works if you intend to use Llama.

Local Version

Install requirements: pip install -r requirements.txt

Docker Version

Obtain the Docker image tortcode/nlp-coresearcher:
- Build: docker build . -t 'tortcode/nlp-coresearcher'
- Or pull from Docker Hub: docker pull 'tortcode/nlp-coresearcher'
Run bash container.sh to start the container.

Experimentation

Task Creation

Place the research idea in the file problems/<task_name>.
Run any preparation scripts as needed.
Place all starter code in the directory workspaces/<task_name>.

Task Execution

To run the agent with a specific task and LLM (Claude, GPT-4, or Llama), execute bash run_demo.sh <task_name> <llm_name>.
- You must have access to the Meta Llama 3.1 models in Hugging Face to run Llama.
To ignore error logging, redirect stderr to /dev/null: bash run_demo.sh <task_name> <llm_name> 2>/dev/null.

Task Logs

Full logs are under logs/<task_name>/<start_timestamp>/agent_log/full_log.jsonl.
Other logs are under logs/<task_name>/<start_timestamp>/env_log/.

License

MLR-Copilot is adapted from MLAgentBench, under the MIT License.

Some components are adapted from Prompt2Model, under the Apache License 2.0. Files utilizing API calls have been modified.

jie311 / MLR-Copilot