rzTian's starred repositories
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
GflowNets_Tutorial
GflowNets, MCMC, Metropolis-Hasting, Gibbs sampling, Metropolis-adjusted Langevin, Inverse Transform Sampling, Acceptance-Rejection Method and Important Sampling
llm_benchmarks
A collection of benchmarks and datasets for evaluating LLM.
awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
llm-alignment-survey
A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey" for more details!
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]
Awesome-GFlowNets
A curated list of resources about generative flow networks (GFlowNets).
robust-local-lipschitz
A Closer Look at Accuracy vs. Robustness
auto-attack
Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
adaptive_attacks_paper
Code for "On Adaptive Attacks to Adversarial Example Defenses"
pytorch-cifar
95.47% on CIFAR10 with PyTorch
robustness
A library for experimenting with, training and evaluating neural networks, with a focus on adversarial robustness.
cleverhans
An adversarial example library for constructing attacks, building defenses, and benchmarking both
Pytorch-Adversarial-Training-CIFAR
This repository provides simple PyTorch implementations for adversarial training methods on CIFAR-10.
adversarially-robust-generalization
This repo contains the code for the experiments in "Rademacher Complexity for Adversarially Robust Generalization"
adversarial
Creating and defending against adversarial examples
adversarial-attacks-pytorch
PyTorch implementation of adversarial attacks [torchattacks]