jvmncs / safe-debates

A PyTorch implementation of AI safety via debate

Home Page:https://arxiv.org/abs/1805.00899

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

safe-debates

A PyTorch implementation of AI safety via debate

This is currently not maintained and very unoptimized. You can view the tracing files in profiles/ to understand why -- it's essentially a poor choice of data structure (Python lookups are awfully slow). Also, there are quite a few TODOs throughout the code.

If anything, this repo should be viewed as a set of design choices and abstractions for developing agents with debate. If you use it, please reference it.

Based on cle-mnist.

References

Experimental Results

Sparse classifier on random data

Density Accuracy Average Cross-Entropy
6px 57.4% 1.1948
4px 46.7% 1.4775

Commands to reproduce
python train_judge.py --pixels 6 --seed 4224 --checkpoint-filename 6px
python train_judge.py --pixels 4 --batches 50000 --seed 4224 --checkpoint-filename 4px

About

A PyTorch implementation of AI safety via debate

https://arxiv.org/abs/1805.00899

License:MIT License


Languages

Language:Python 100.0%