posgnu / innereye

A cross-lingual basic block embedding model using LSTM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InnerEye

This repository implements InnerEye, LSTM-based cross-platform binary code embedding generating tool that appears in the following paper.

@inproceedings{zuo2019neural,
title={Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs},
author={Zuo, Fei and Li, Xiaopeng and Young, Patrick and Luo,Lannan and Zeng,Qiang and Zhang, Zhexin},
booktitle={Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS)},
year={2019} }

The main purpose of this implementation is for providing a baseline for cross-platform binary code embedding research and the experiment results appeared in Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment.

Getting Started

Our implementation is mostly based on the official implementation of the author of the paper while we apply the model to our cross-platform datasets that cover a broad range of software diciplines; SQLite3 (database), OpenSSL (network), cURL (file transfer), Httpd (webserver), libcrypto (crypto library), glibc (standard library). The data is preprocessed following the scheme described in the original paper and stored in data directory that is structured in the same way with XBA. Other components are structured as follows.

.
├── README
├── Pipfile                 # Manages a Python virtualenv.
├── Pipfile.lock            # Manages a Python virtualenv (Do not touch).
├── extract.py             # 
├── train.py             #  
├── utils.py             #  
├── validation.py             #  
├── data             # 
├── embeddings             # 
├── weights             # 

Install

Prerequisites

Python 3.8 or above version is required. To install python dependencies, you need to install pipenv first.

$ pip3 install pipenv

Use pipenv shell

Install dependencies

$ pipenv install

Activate pipenv shell

$ pipenv shell

Use your own python virtual environment

Extract requirements.txt

$ pipenv lock -r > requirements.txt

Install dependencies

$ pip install -r requirements.txt

How to run

A several desired sequences of executable are defined in the Makefile.

Training Instruction2vec (i2v) embeddings and Siamese-LSTM from data in /revos/data/done/${programs}/innereye.csv

$ pipenv run -- python train.py --targets={programs}

Test the trained model

$ pipenv run -- python validation.py

Extract basic block embeddings using a model trained on {programs}

$ pipenv run -- python extract.py --targets={programs}

About

A cross-lingual basic block embedding model using LSTM


Languages

Language:Python 91.9%Language:Makefile 8.1%