wesg52

Wes Gurnee's starred repositories

mamba

Mamba SSM architecture

Language:PythonApache-2.012696 101 511

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Language:PythonApache-2.09866 108 1958

transformer-debugger

Language:PythonMIT4015 25 14

leafmap

A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment

Language:PythonMIT3184 59 259

anthropic-sdk-python

Language:PythonMIT1355 115 148

awesome-neural-geometry

A curated collection of resources and research related to the geometry of representations in the brain, deep networks, and beyond

912 29 1

representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Language:Jupyter NotebookMIT699 28 46

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

Language:PythonApache-2.0605 9 60

redun

Yet another redundant workflow engine

Language:PythonApache-2.0516 15 54

torchlens

Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.

Language:PythonGPL-3.0461 6 18

cola

Compositional Linear Algebra

Language:PythonApache-2.0401 4 43

SAELens

Training Sparse Autoencoders on Language Models

Language:Jupyter NotebookMIT383 8 92

nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Language:Jupyter NotebookMIT367 4 74

inseq

Interpretability for sequence generation models 🐛 🔍

Language:PythonApache-2.0362 10 82

Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

CC0-1.0241 5 4

world-models

Extracting spatial and temporal world models from LLMs

Language:Jupyter NotebookMIT233 6 4

Automatic-Circuit-Discovery

Language:Jupyter NotebookMIT174017

sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

Language:PythonMIT173 4 41

sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).

Language:HTMLMIT137 7 20

dictionary_learning

Language:Python115 5 5

Awesome-LLM-Interpretability

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

112 2 1

sleeper-agents-paper

Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".

81 2 1

devinterp

Tools for studying developmental interpretability in neural networks.

Language:Python70 9 29

modeldiff

ModelDiff: A Framework for Comparing Learning Algorithms

Language:Jupyter NotebookMIT52 4 1

sparse-probing-paper

Sparse probing paper full code.

Language:Jupyter NotebookMIT47 2 2

universal-neurons

Universal Neurons in GPT2 Language Models

Language:Jupyter NotebookMIT25 3 2

elk-generalization

Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard

Language:PythonMIT24 2 1

edge-attribution-patching

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Language:Jupyter Notebook22 2 1

ActivationDirectionAnalysis

Language:Python8 1 1

llmICL

Language:Jupyter Notebook7 30