`safe-debates`

A PyTorch implementation of AI safety via debate

This is currently not maintained and very unoptimized. You can view the tracing files in profiles/ to understand why -- it's essentially a poor choice of data structure (Python lookups are awfully slow). Also, there are quite a few TODOs throughout the code.

If anything, this repo should be viewed as a set of design choices and abstractions for developing agents with debate. If you use it, please reference it.

Based on cle-mnist.

References

Experimental Results

Sparse classifier on random data

Density	Accuracy	Average Cross-Entropy
6px	57.4%	1.1948
4px	46.7%	1.4775

Commands to reproduce
python train_judge.py --pixels 6 --seed 4224 --checkpoint-filename 6px
python train_judge.py --pixels 4 --batches 50000 --seed 4224 --checkpoint-filename 4px

About

A PyTorch implementation of AI safety via debate

https://arxiv.org/abs/1805.00899

MIT License

Languages

Language:Python 100.0%