PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

This repository contains benchmark contracts, human-written & LLM-written formal properties and experiment results in support of the CCS2024 submission for PropertyGPT.

Description

├── README.md
├── RQ1
│   ├── PropertyGPT sub-project level evaluation.md
│   └── assets
├── RQ2 cve
│   ├── CVEs
│   ├── CVEs_basic_scan_result.xlsx
│   └── CVEs_comparision.xlsx
├── RQ2 smartInv
│   ├── assets
│   ├── smartInvRQ2.md
│   └── smartInv_RQ1
├── RQ3
│   ├── asset
│   └── readme.md
├── RQ3 failure info
│   ├── asset
│   └── readme.md
├── RQ4
│   ├── RQ4.md
│   └── RQ4.xlsx
├── certora_projects
│   ├── aave_l2_bridge
│   ├── aave_proof_of_reserve
│   ├── aave_rescue_mission
│   ├── aave_staked_token
│   ├── aave_starknet_bridge
│   ├── aave_static_token
│   ├── aave_v2
│   ├── aave_v3
│   ├── aave_vault
│   ├── celo_governance
│   ├── combined_output_train_all.csv
│   ├── compound_moneymarket
│   ├── contract_extractor.py
│   ├── extractor.py
│   ├── furucombo
│   ├── gho-core
│   ├── keep_fully
│   ├── lido_v2
│   ├── notional_finance_v2
│   ├── openzepplin
│   ├── opyn_gamma_protocol
│   ├── ousd
│   ├── popsicle_v3_optimizer
│   ├── radicle_drips
│   ├── reports_info.md
│   ├── rocket_joe
│   └── sushi_benttobox
├── rule_classification
│   ├── all_rules.csv
│   ├── assets
│   ├── category_of_spec
│   └── rule_classification.md
└── weight_calcu
    ├── linear_analysis.py
    └── path_to_save_results.xlsx

RQ1 - Property Generation

We include sampled property dataset and property generation results in RQ1 and all the crawled Certora audit projects are available at Certora_projects.
RQ1 includes detailed links to associated project files, offering specific counts of properties for each evaluated project.

RQ2 - Vulnerability Detection

CVE: We detail the collection and evaluation process of CVEs, including comparisons with results from gptscan.
Curated SmartInv benchmark: We include the selection process of this benchmark, property generation results and vulnerability detection results.

RQ3 - Ablation Study

Property Fix: We document the success rates and number of repair times during property generation driven by LLM.
Top-K setting: We show precision, recall, and F1-score for various Top-K settings for selecting appropriate properties as candidates.

RQ4 - Zero-Day Bug Finding

We list the results of zero-day bugs found by PropertyGPT and detail the corresponding properties generated by PropertyGPT.

monperrus / PropertyGPT