firstuserhere's repositories
gpt4Vadvanced
Testing GPT-4 Vision on Advanced examination questions (2023) across physics, chemistry, and mathematics
metalearning
This is a repository and github pages website deployment for my work on the mechanistic analysis of out-of-context meta-learning in LLMs
awesome-mech-interp
An awesome curated list of resources dedicated to Mechanistic interpretability
basic-scripts
a bunch of basic scripts hacked together but working and are maybe useful for me
firstuserhere.github.io
This is my website
multimodal-mechinterp
Basic mech interp analysis for some multimodal models
outofcontextnotes
This repository holds my notes and thoughts (always WIP) while doing work on the "out of context meta learning" project.
replications
My attempts at replicating results of papers
aisc_oocl_experiments
experiments trying to elicit out of context learning when training a transformer on a simple task
ComPromptMized
ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications
GPU-Puzzles
Solve puzzles. Learn CUDA.
Improved-worldmodels
Critiques of the pre-print, suggestions for improvement, and counterfactual examples testing
lit
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
miras-sudoku-solution
Fork of a possible solution for testing
nanogenmo
National Novel Generation Month, 2023 edition.
sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
SPARta
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
transformer-debugger
My fork of the original transformer Debugger library by openAI
transformerperspectives
Looking at data through the perspective of different components of a transformer model
visualize-SAE
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
ViT-Prisma
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
Whisper-mechinterp
Mechanistic Interpretability for Whisper