minaek / reward-misspecification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code for Reward Misspecification Experiments

This repository contains code for the paper The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models.

Instructions

Each repository has its own installation requirements. We recommend setting up a new virtual environment for each environment and following the instructions provided in each README. The code has been tested using Python 3.7 on machines running Ubuntu 18.04.

Based off of code from

The flow, pandemic, glucose, atari folders hold code for the traffic, COVID, blood glucose monitoring, and atari experiments, respectively. The flow_cfg folder holds experiment setup for the traffic experiments.

Citation

If you use these environments in your own work, please cite us as:

@inproceedings{
    pan2022rewardhacking,
    title={The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models},
    author={Alexander Pan and Kush Bhatia and Jacob Steinhardt},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=JYtwGwIL7ye}
}

About

License:MIT License


Languages

Language:Python 95.1%Language:C++ 2.7%Language:Shell 2.0%Language:Dockerfile 0.2%