ngmarchant / rsdel-randomized-deletion-malware

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RS-Del: Robustness Certificates for Sequence Classifiers via Randomized Deletion

Badge License: MIT

This repository hosts the implementation of our submission to NeurIPS 2023 titled "RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion".


πŸ“‚ Directory Structure

.
β”œβ”€β”€ configs
β”‚   β”œβ”€β”€ certify-exp                   # Configs for evaluation step
β”‚   β”œβ”€β”€ models                        # Configs for malware detection models
β”‚   └── repeat-forward-exp            # Configs for sampling step
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ binaries                      # Executables for training and evaluation
β”‚   └── {test,train,valid}.csv        # CSV files for data partitioning
β”œβ”€β”€ docker                            # Docker deployment files
β”œβ”€β”€ outputs                           # Directory for experimental outputs
β”œβ”€β”€ run_scripts                       # Shell scripts for running experiment steps
└── src                               # Source code directory
    β”œβ”€β”€ torchmalware                  # Python package with core implementations
    β”œβ”€β”€ train.py                      # Script for training models
    β”œβ”€β”€ repeat_forward_exp.py         # Script for sampling perturbed inputs
    β”œβ”€β”€ fp_curve-repeat_forward.py    # Script for computing FPR curve
    └── certify_exp-repeat_forward.py # Script for computing certified radius

πŸš€ Getting Started

1. Model Training

  • Train the smoothed model using data augmentation via src/train.py.
  • Example: See run_scripts/task1-train.sh.
python src/train.py --conf configs/models/sample_config.yaml

2. Prediction, Certification & Calibration Sampling

  • Save base model confidence scores via src/repeat_forward_exp.py.
  • Example: See run_scripts/task2-repeat_forward.sh.
python src/repeat_forward_exp.py --conf configs/repeat-forward-exp/sample_config.yaml

3. False-Positive Rate Calibration (Optional)

  • Vary the decision threshold and compute the FPR via src/fp_curve-repeat_forward.py.
  • Example: See run_scripts/task3-fp_curve.sh.
python src/fp_curve-repeat_forward.py --path model/checkpoint.pth --repeat-conf configs/repeat-forward-exp/sample_config.yaml

4. Certification

  • Compute the certified radius via src/certify_exp-repeat_forward.py.
  • Example: See run_scripts/task4-certify-repeat_forward.sh.
python src/certify_exp-repeat_forward.py --repeat-conf configs/repeat-forward-exp/sample_config.yaml --certify-conf configs/certify-exp/sample_config.yaml

🐳 Docker Deployment

Execute the steps in the provided Docker container.

git clone $REPO_NAME $DEST
cd $DEST/run_scripts
chmod +x ./run.sh
./run.sh -p $SH_PATH -m $MEM -c $NUM_CORES -g $GPU_ID
  • For sequential execution of all steps (1-4), use run_scripts/task-full.sh (Not recommended due to long running time).

πŸ“Š Reproducing Experiments

For reproducing experiments on your dataset, follow the instructions in data/README.md.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Cite us as

@inproceedings{huang2023rsdel,
  author    = {Huang, Zhuoqun and Marchant, Neil and Lucas, Keane and Bauer, Lujo and Ohrimenko, Olya and Rubinstein, Benjamin I. P.},
  title     = {{RS-Del}: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion},
  year      = {2023},
  booktitle = {Advances in Neural Information Processing Systems},
  series    = {NeurIPS},
}

About

License:MIT License


Languages

Language:Python 95.4%Language:Shell 3.7%Language:Dockerfile 1.0%