We are ByteDance Seed team.
You can get to know us better through the following channelsπ
This repository contains the predictions, execution logs, trajectories, and results for model inference + evaluation runs on the Multi-SWE-bench task.
If you are interested in submitting your model to the Multi-SWE-bench Leaderboard, please do the following:
-
Fork this repository.
-
Clone the repository. Due to this repository's large diff history, consider using
git clone --depth 1if cloning takes too long. -
Under the split that you evaluate on (e.g.
evaluation/java/verified/), create a new folder with the submission date and the model name (e.g.20250329_Agentless_Claude-3.7-Sonnet). -
Within the folder (
evaluation/<split>/<date + model>), please include the following required assets:all_preds.jsonl: Model predictionsresults/: Multi-SWE-bench evaluation artifacts dump, containing:results.json: Summary of evaluation outcomes
logs/: Multi-SWE-bench evaluation artifacts dump, which stores the contents of the language folder generated in theworkdirafter the evaluation. The folder structure is as follows:logs/ βββ [org]/[repo]/ # A certain repository β βββ evals/ # Files related to the evaluation process β β βββ pr-[id]/ # Files for a certain instance evaluation process β β β βββ fix.patch # The model's generated prediction β β β βββ fix-patch-run.log # A log of evaluation steps β β β βββ report.json # Summary of evaluation outcomes for this instance β β βββ ... # Other instance evaluation process files β βββ images/ # (Optional) Files related to the image build process βββ ... # Other repositoriesmetadata.yaml: Metadata for how result is shown on website. Please include the following fields:name: The name of your leaderboard entryorgIcon(optional): URL/link to an icon representing your organizationoss:trueif your system is open-sourcesite: URL/link to more information about your systemverified:false(See below for results verification)
trajs/: Reasoning trace reflecting how your system solved the problem- Submit one reasoning trace per task instance. The reasoning trace should show all of the steps your system took while solving the task. If your system outputs thoughts or comments during operation, they should be included as well.
- The reasoning trace can be represented with any text based file format (e.g.
md,json,yaml) - Ensure the task instance ID is in the name of the corresponding reasoning trace file.
- For an example, see Agentless + Claude-3.7-Sonnet
-
Create a pull request to this repository with the new folder, and the leaderboard will automatically update once the PR is merged.
You can refer to this tutorial for a quick overview of how to evaluate your model on Multi-SWE-bench.
The Verified check β indicates that we (the Multi-SWE-bench team) received access to the model and were able to reproduce the patch generations.
If you are interested in receiving the "verified" checkmark β on your submission, please do the following:
- Create an issue.
- In the issue, provide us instructions on how to run your model on Multi-SWE-bench.
- We will run your model on a random subset of Multi-SWE-bench and verify the results.
We host all model trajectories and execution logs on Hugging Face at Multi-SWE-bench_trajs.
You can download and inspect them locally for detailed analysis.
We express our deepest gratitude to the creators of the SWE-bench dataset. This project is an adapted version of their original experiments repository.
If you found Multi-SWE-bench helpful for your work, please cite as follows:
@misc{zan2025multiswebench,
title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving},
author={Daoguang Zan and Zhirong Huang and Wei Liu and Hanwu Chen and Linhao Zhang and Shulin Xin and Lu Chen and Qi Liu and Xiaojian Zhong and Aoyan Li and Siyao Liu and Yongsheng Xiao and Liangqiang Chen and Yuyu Zhang and Jing Su and Tianyu Liu and Rui Long and Kai Shen and Liang Xiang},
year={2025},
eprint={2504.02605},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2504.02605},
}
This project is licensed under Apache License 2.0. See the LICENSE flie for details.
π’ About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
