hari-sikchi / awesome-ai-safety

A curated list of awesome AI safety papers, projects and communities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome AI Safety Awesome

Table of Contents

Videos and Lectures

  1. Concrete problems in AI safety By Robert Miles
  2. Online course on AI safety
  3. Safe Reinforcement Learning By Philip Thomas
  4. Safe Reinforcement Learning by Mohammad Ghavamzadeh
  5. Safe RL robotics by Felix Berkenkamp
  6. Safe Artificial Intelligence by Victoria Krakovna

Papers

  1. Scalable agent alignment via reward modeling: a research direction
  2. AGI safety literature review
  3. Concrete Problems in AI safety
  4. Preventing Side-effects in Gridworlds
  5. A Gym Gridworld Environment for the Treacherous Turn
  6. Preferences Implicit in the State of the World
  7. Conservative Agency via Attainable Utility Preservation
  8. Penalizing side effects using stepwise relative reachability
  9. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
  10. Incorrigibility in the CIRL Framework
  11. The Off-Switch Game.
  12. Corrigibility
  13. Learning the Preferences of Ignorant, Inconsistent Agents
  14. Cooperative inverse reinforcement learning
  15. Towards Interactive Inverse Reinforcement Learning .
  16. Repeated Inverse Reinforcement Learning
  17. Should robots be obedient?
  18. Inverse Reward Design
  19. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
  20. Simplifying Reward Design through Divide-and-Conquer
  21. An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
  22. Reward learning from human preferences and demonstrations in Atari
  23. Supervising strong learners by amplifying weak experts
  24. AI safety via debate
  25. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
  26. Deep reinforcement learning from human preferences
  27. Agent-Agnostic Human-in-the-Loop Reinforcement Learning
  28. Avoiding Wireheading with Value Reinforcement Learning
  29. Reinforcement learning with a corrupted reward channel

Safe Exploration

  1. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
  2. Safe Exploration for Interactive Machine Learning
  3. Stagewise Safe Bayesian Optimization with Gaussian Processes
  4. Safe Exploration in Continuous Action Spaces
  5. A lyapunov based approach to safe Reinforcement Learning
  6. Lyapunov based safe policy optimization for continuous control
  7. IPO: Interior-point Policy Optimization under Constraints
  8. CPO: Constrained policy optimization

Tutorials

Researchers

Websites

  1. https://80000hours.org/articles/ai-safety-syllabus/
  2. https://humancompatible.ai/bibliography
  3. http://aisafety.stanford.edu/
  4. https://intelligence.org/research/#publications
  5. https://ai-alignment.com/?gi=7c7707e4c512
  6. https://vkrakovna.wordpress.com/
  7. https://forum.effectivealtruism.org/

Blogs

  1. RAISE blog
  2. Towards safe reinforcement learning

Contributing

Have anything in mind that you think is awesome and would fit in this list? Feel free to send a pull request.


License

CC0

To the extent possible under law, Harshit Sikchi has waived all copyright and related or neighboring rights to this work.

About

A curated list of awesome AI safety papers, projects and communities.