Thomas Kwa's repositories
Sledgehammer
A code-golf language written in Mathematica
othello-gpt-ideas
Submission to Neel Nanda's 2022 SERI MATS stream.
sae-enhanced-cd
Replication of the paper "Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models" (https://arxiv.org/pdf/2405.12522)
algebraic_value_editing
Experiments testing the algebraic value-editing conjecture (AVEC) on GPT-2 models
catastrophic-goodhart
Plots and empirical results for Catastrophic Goodhart https://www.lesswrong.com/s/6rhjdbnEXoek4YiH7
exist-mood-import
Import scripts for existing mood tracking app data
iit
A replication and extension of the paper "Inducing Causal Structure for Interpretable Neural Networks" by Atticus Geiger
katago_retarget
Retarget KataGo to output the worst move by flipping activations.
nonsurrounding-polyomino
Finding a polyomino that cannot surround a 1x1 square, using the ORTools SAT solver.
ShortcutBadger
An Android library supports badge notification like iOS in Samsung, LG, Sony and HTC launchers.
turntrout-plots
Data files from Alex Turner's experiments and posts