feature-steering

There are 0 repository under feature-steering topic.

PaulPauls / llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
feature-extraction feature-steering llama3 llm-interpretability open-research pytorch sparse-autoencoder
Language:Python 622

PaulPauls / llama3_interpretability_sae