apartresearch

apartresearch's repositories

interpretability-starter

🧠 Starter templates for doing interpretability research

5300

specificityplus

👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

Language:PythonNOASSERTION18 2 24

Neuron2Graph

Tools for exploring Transformer neuron behaviour, including input pruning and diversification.

Language:Jupyter NotebookApache-2.016 2 1

readingwhatwecan

📚📚📚📚📚📚📚📚📚 Reading everything

Language:CSS1200

Integer_Addition

✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks

Language:Jupyter NotebookMIT10 3 1

aisafetyideas

💡 The web app CI/CD for aisafetyideas.com

Language:Svelte8035

deepdecipher

🦠 DeepDecipher: An open source API to MLP neurons

Language:RustMIT8 1 101

evaluations-starter

How to get started in evaluations and demonstrations research for dangerous capabilities

MIT5 3 2

ai-psychology-starter

Code templates to get started as an AI psychologist

Language:Jupyter Notebook4 10

mechanisticinterpretability

A repository for awesome resources in mechanistic interpretability

300

AIS-cost-effectiveness

Cost-effectiveness models, tools, and results for various AI safety field-building programs.

Language:PythonMIT200

othelloscope

Interpretability Hackathon 2.0 entry

Language:Jupyter NotebookMIT203

scheduling-widget

📆 Showcases specific times in local time zones

Language:HTML2 10

blackbox-psych

Conducting psychology experiments on black box language models

Language:HTML1 10

empathetic-ai

🤖 A systematic review on how to create empathetic AI

Language:TeX1 20

ICML2024MI

🌍 Website for NeurIPS2023MI

Language:CSS1 20

n2g

Tools for exploring Transformer neuron behaviour, including input pruning and diversification.

Language:Jupyter NotebookApache-2.0100

safety-timelines

📈 Research into when alignment is solved

Language:R1 10

scale-llm-24

🌍 Website for the Scaling Laws workshop

Language:CSS100

seqcont_circuits

✱ Interpreting how similar sequence continuation tasks share internal representations ✱

Language:Jupyter NotebookMIT100

task-standard

🚨 METR Task Standard fork for the Code Red Hackathon

Language:TypeScript100

.github

000

Apart-Evals

000

GPT-4-Chat-UI

GPT-4 frontend with open source Next.js template.

Language:JavaScriptMIT000

hackathon-utils

😎 Code to run hackathons efficiently

MIT010

Interpreting-Reward-Models

✱ Interpreting implicit reward models learnt in RLHF using sparse autoencoders.

Language:Jupyter NotebookMIT01 7

open

🌍 Repository to update our open data

MIT000

paper-website

🌍 Website template for academic papers

Language:JavaScriptMIT010

town_hall_avatar

Uses ChatGPT to simulate a townhall discussion between avatars

Language:Python000

Verified_addition

Language:Jupyter Notebook030