bingghost / bindiffmatch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BinaryAI BindiffMatch algorithm

This repo contains BinaryAI file comparison algorithm implementation, along with datasets and metric scripts.

Project

The binaryai_bindiffmatch directory is BinaryAI BindiffMatch algorithm, not including BAI-2.0 model and embedding implementation.

The data directory contains metric datasets. (You can download it from release assets )

data/files contains unstripped files and stripped files.
We use binaries from coreutils, diffutils and findutils libraries as testcases. These binaries are experiment data in DeepBinDiff project, go to origin project to get these binaries.
We manually build some versions of openssl project and choose two files as example case. Here are the sources openssl-1.1.1u openssl-3.1.1

data/labeleds contains pre-generated infos of functions in each binary file. The basicinfo, pseudocode, callees, name are powered by Ghidra, and feature embedding vectors are powered by BinaryAI BAI-2.0 model. Scripts to generate these file are not included in this project.

data/matchresults contains pre-generated match results on testcases and example, powered by BinaryAI BindiffMatch algorithm and Diaphora, as well as the groundtruth results.
BinaryAI BindiffMatch results can be generated by python -m binaryai_bindiffmatch <file1_labeled_doc> <file2_labeled_doc> -o <matchresult> on each pair of files.
Diaphora results are generated by first applying patch on this commit, then using IDA headless mode to export .sqlite database. After then, run offline Diaphora script to generate .diaphora results (with relaxed_ratio set to True, other options keep default), and finally convert to json as same format as BinaryAI results. Scripts for doing these are not included in this project.

Install

Require Python >= 3.10
Run pip install .[lowmem] to install this package and its dependencies

Metric

python scripts/metrics.py testcases binaryai: get metric result on full testcases powered by BinaryAI BindiffMatch algorithm
python scripts/metrics.py testcases diaphora: get metric result on full testcases powered by Diaphora
python scripts/metrics.py example binaryai: get metric result on example case powered by BinaryAI BindiffMatch algorithm
python scripts/metrics.py example diaphora: get metric result on example case powered by Diaphora

About


Languages

Language:Python 100.0%