Beast code in Giters

Joseph Bloom's repositories

SAELens

Training Sparse Autoencoders on Language Models

Language:Jupyter NotebookMIT378 8 92

DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks

Language:Jupyter NotebookMIT63 4 73

Experimental code which trains 26 linear probes to detect the presence of alphabetic letters in GPT-J token strings, given their embeddings. Exploring the resulting vector arithmetic and its impact on GPT-J spelling abilities

Language:Jupyter Notebook2 10

toy_model_interpretability

I'd like to start playing around with toy models to better understand results in recent papers.

Language:Python1 20

TransformerLens

Language:PythonMIT1 10

arena-v1

Language:Jupyter Notebook010

arena-v1-ldn

Language:Jupyter Notebook010

ARENA_2.0

I'm teaching ARENA 2.0 and providing students with direction on careers and personal development.

Language:Python010

ARENA_2.0-RLHF

Preparing content for the ARENA RLHF day.

Language:Jupyter Notebook030

SparseAutoencoderSuperposition

Language:Jupyter NotebookMIT020

babyai

BabyAI platform. A testbed for training agents to understand and execute language commands.

Language:PythonBSD-3-Clause010

Backwards

Language:Python010

Exploring-2L-SAE

Language:HTMLMIT010

geom_median

Fast and differentiable geometric median, a multivariate median analogue. Install with `pip install geom-median`

Language:PythonNOASSERTION010

Minigrid

Simple and easily configurable grid world environments for reinforcement learning

Language:PythonNOASSERTION010

Module-1

Module 1 - Autodifferentiation

Language:Python010

post--memory-dt-features

Language:HTMLCC-BY-4.0010

protein-inference

A python package for protein inference in Mass Spectrometric data analysis.

Language:PythonMIT010

rust_cli_project

I'm teaching myself Rust.

Language:Rust020

rust_text_editor

Learning by doing with Rust. Following along the Hecto tutorial https://www.philippflenker.com/hecto/

Language:Rust020

sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

Language:PythonMIT010

SpellingSAEExperiment

Language:Python010

jbloomAus

Joseph Bloom's repositories

SAELens

DecisionTransformerInterpretability

SAEDashboard

alphabetical_probe

toy_model_interpretability

TransformerLens

arena-v1

arena-v1-ldn

ARENA_2.0

ARENA_2.0-RLHF

SparseAutoencoderSuperposition

babyai

Backwards

Exploring-2L-SAE

geom_median

Minigrid

Module-1

post--memory-dt-features

protein-inference

rust_cli_project

rust_text_editor

sparse_autoencoder

SpellingSAEExperiment