Joseph Bloom's repositories
DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
alphabetical_probe
Experimental code which trains 26 linear probes to detect the presence of alphabetic letters in GPT-J token strings, given their embeddings. Exploring the resulting vector arithmetic and its impact on GPT-J spelling abilities
toy_model_interpretability
I'd like to start playing around with toy models to better understand results in recent papers.
ARENA_2.0-RLHF
Preparing content for the ARENA RLHF day.
geom_median
Fast and differentiable geometric median, a multivariate median analogue. Install with `pip install geom-median`
protein-inference
A python package for protein inference in Mass Spectrometric data analysis.
rust_cli_project
I'm teaching myself Rust.
rust_text_editor
Learning by doing with Rust. Following along the Hecto tutorial https://www.philippflenker.com/hecto/
sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability