HUJA9's starred repositories
DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
natural-instructions
Expanding natural instructions
OpenAttack
An Open-Source Package for Textual Adversarial Attack.
tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
decomp_attn_keras
Parikh et al., A Decomposable Attention Model for Natural Inference
InstructEval
[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
TruthfulQA
TruthfulQA: Measuring How Models Imitate Human Falsehoods
modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
TransformerLens
A library for mechanistic interpretability of GPT-style language models
honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
promptsource
Toolkit for creating, sharing and using natural language prompts.
NLPer-Conferences-Journals-Survey
Survey of NLP+AI Conferences and Journals for NLPers
PyTorch-VAE
A Collection of Variational Autoencoders (VAE) in PyTorch.
pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".