Beast code in Giters

Fabien Roger's repositories

Learning-From-Negative-Examples

Language:Python6 10

concistency-lenses

Language:Jupyter NotebookMIT3 10

sandbagging

Language:Python2 10

Countergen

A framework for generating counterfactual datasets, evaluating NLP models, and editing models to reduce bias

Language:PythonMIT1 10

llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Language:Python100

Password-Locked-LLM

Language:Python1 10

trackoai

Language:Python1 10

Distributed-Text-Search

Brute-force approximate match search - parallelized using MPI, OpenMP and Cuda.

Language:C010

AI-Influence

Language:Python010

alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language:Jupyter NotebookApache-2.0000

circuit-breakers

Improving Alignment and Robustness with Circuit Breakers

Language:Jupyter Notebook000

codes

Language:Python000

control-evaluations

Language:PythonMIT000

control-poison

Language:Python010

Countergen-Website

Language:TypeScript010

double_ntp

Language:Python010

feature_benchmark

Language:Jupyter Notebook000

General-Induction-Heads

Language:Python010

keyloger

A keyloger encrypting and saving data locally

Language:Python010

lm-choice

Language:HTML010

lm-game-analysis-main

Language:Python000

model_organism_examples

Language:HTML000

monitoring-demo

Language:JavaScript010

ODIN-Extension

A simple and effective method for detecting out-of-distribution images in neural networks.

Language:PythonNOASSERTION000

othello_playground

Emergent world representations: Exploring a sequence model trained on a synthetic task

Language:Jupyter NotebookMIT000

Quantization-Awareness

Language:PythonMIT010

RCT-Simulations

Language:Python010

wmdp

Language:PythonMIT000

word-diff

Language:HTML000

xriskcalculator

Language:TypeScriptMIT000