Daniel Tan's repositories
feature_composition
Experiments on feature composition in toy models and SAEs
feature-lens
Visualizing SAE features in terms of their upstream and downstream features
steering-bench
Evaluation suite for steering vectors
auto-circuit
A library for efficient patching and automatic circuit discovery.
belief-state-superposition
A repository for training transformers with belief states
eindex
My interpretation of what einops indexing would look like (created to work on during my SERI MATS project).
factor-world
Controllable visual factors of variation for robot learning in Metaworld. Implemented in Gymnasium and pip-installable
Gymnasium-Robotics
A collection of robotics simulation environments for reinforcement learning
jam
Jam - JAX models
sae-eap
Edge attribution patching with SAEs
SAELens
Training Sparse Autoencoders on Language Models
stock-images
A collection of stock images for doing vision interp
SycophancySteering
Modulating sycophancy in llama-2 via activation steering
transcoders-slim
A minimal implementation of transcoders