Daniel Tan's repositories

rl_cbf

Code accompanying "Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory"

Language:PythonLicense:MITStargazers:12Issues:4Issues:1

sae-probe

Investigating the feasibility of using SAE features as a basis for sparse reconstructions of linear probes

Language:PythonStargazers:3Issues:0Issues:0

feature_composition

Experiments on feature composition in toy models and SAEs

Language:PythonStargazers:1Issues:0Issues:0

repepo

Codebase for comparing Representation Engineering vs baselines on a variety of tasks

Language:Jupyter NotebookStargazers:1Issues:0Issues:0

feature-lens

Visualizing SAE features in terms of their upstream and downstream features

Language:HTMLStargazers:0Issues:0Issues:0

steering-bench

Evaluation suite for steering vectors

Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

auto-circuit

A library for efficient patching and automatic circuit discovery.

Language:PythonStargazers:0Issues:0Issues:0

belief-state-superposition

A repository for training transformers with belief states

Language:PythonStargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:1Issues:0
Language:PythonLicense:MITStargazers:0Issues:1Issues:0

eindex

My interpretation of what einops indexing would look like (created to work on during my SERI MATS project).

Language:PythonStargazers:0Issues:0Issues:0

factor-world

Controllable visual factors of variation for robot learning in Metaworld. Implemented in Gymnasium and pip-installable

Language:PythonStargazers:0Issues:1Issues:8
License:MITStargazers:0Issues:0Issues:0

Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning

Language:PythonLicense:MITStargazers:0Issues:0Issues:1
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

jam

Jam - JAX models

Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

sae-eap

Edge attribution patching with SAEs

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

SAELens

Training Sparse Autoencoders on Language Models

Language:HTMLLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

stock-images

A collection of stock images for doing vision interp

Stargazers:0Issues:0Issues:0

SycophancySteering

Modulating sycophancy in llama-2 via activation steering

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

transcoders-slim

A minimal implementation of transcoders

Language:PythonStargazers:0Issues:0Issues:0