DevInterp

A Python Library for Developmental Interpretability Research

DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.

⚠️ This library is still in early development. Don't expect things to work on a first attempt. We are actively working on improving the library and adding new features. If you have any questions or suggestions, please feel free to open an issue or submit a pull request.

Installation

To install devinterp, simply run:

pip install devinterp

Requirements: Python 3.8 or higher.

Getting Started

To see DevInterp in action, check out our example notebooks:

Minimal Example

from devinterp.slt import estimate_learning_coeff, estimate_learning_coeff_with_summary
from devinterp.optim import SGLD

# Assuming you have a PyTorch Module and DataLoader
learning_coeff = estimate_learning_coeff(model, trainloader, ...)

# If you want to see mean, std, and learning coeff estimate per chain
learning_coeff_summary = estimate_learning_coeff_with_summary(model, trainloader, ...)

Features

Estimate the learning coefficient.
- Supported optimizers:
  - SGLD
  - SGNHT

Contributing

See CONTRIBUTING.md for guidelines on how to contribute.

About

Tools for studying developmental interpretability in neural networks.

Languages

Language:Python 100.0%