raoofnaushad / mechanistic_intepreteability

Explore the interpretability of language models with TransformerLens in this repository. We leverage Hugging Face Transformers and the mechanistic interpretability package to reverse engineer the algorithms learned by these models during training, shedding light on their inner workings.