SPY Lab's repositories
rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
diffusion_denoised_smoothing
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
lm_memorization_data
Data for "Quantifying Memorization Across Neural Language Models"
lm-extraction-benchmark-data
Datasets for the SATML 2023 competition on training data extraction
misleading-privacy-evals
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
data-decay
Playing around with the CC3M data
privacy
Library for training machine learning models with privacy for training data