Simon Lermen's repositories
redteaming
redteaming a simple language model like gpt2. based on anthropic redteaming paper
exploring_modelgraded_evaluation
exploring model-graded evaluation
safety_benchmarks
Safety Benchmarks such as Refusal Bench
SVDInterpretTransformer
Apply SVD to Transformer weights
al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
langchain
⚡ Building applications with LLMs through composability ⚡
LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
Minigrid
Simple and easily configurable grid world environments for reinforcement learning
mlab
Machine Learning for Alignment Bootcamp
MLAB-Transformers-From-Scratch
Reimplementing transformers from scratch (from Redwood Research's Machine Learning for Alignment Bootcamp).
python-binance
Binance Exchange API python implementation for automated trading
reference_chatbot
In-Context Retrieval-Augmented Language Models AI21labs Implementation
refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
simple-llama-finetuner
Simple UI for LLaMA Model Finetuning
TextWorld
TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
weblm
Drive a browser with a language model