liweijiang / delphi

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can Machines Learn Morality? The Delphi Experiment

This is the official repository for our preprint:

Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi. Can Machines Learn Morality? The Delphi Experiment. 2022.

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines’ morality—or lack thereof. Existing AI systems deployed to millions of users are already making decisions loaded with moral implications, yet, moral questions are among the most intensely debated the world over. This poses a seemingly impossible challenge: teaching machines morality, while humanity continues to grapple with it.

To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks and trained to predict human moral judgments. Empirical results shed novel insights on the promises and limits of machine ethics. Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment, confirming the need for explicitly teaching machines a moral sense.

Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite these shortcomings, we demonstrate positive use cases of Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization

Data and Model Access

You can access the Commonsense Norm Bank dataset by filling out this form.

For accessing the Delphi model checkpoints and API calls please feel free to reach out to Liwei Jiang at lwjiang@cs.washington.edu.

If you find our paper or data useful, please cite the paper:

@article{jiang2022machines,
      title={Can Machines Learn Morality? The Delphi Experiment}, 
      author={Liwei Jiang and Jena D. Hwang and Chandra Bhagavatula and Ronan Le Bras and Jenny Liang and Jesse Dodge and Keisuke Sakaguchi and Maxwell Forbes and Jon Borchardt and Saadia Gabriel and Yulia Tsvetkov and Oren Etzioni and Maarten Sap and Regina Rini and Yejin Choi},
      year={2022},
      eprint={2110.07574},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About


Languages

Language:Python 100.0%