Goal-Oriented Chatbot trained with Deep Reinforcement Learning

Based off of the code repo TC-Bot and paper End-to-End Task-Completion Neural Dialogue Systems. This repo is a simplified version of TC-Bot, it performs at a similar level of accuracy, although it is not directly comparable.

Details

This shows how to train a simple DQN agent with deep reinforcement learning as a goal-oriented chatbot using a simple user simulator. The code is a simplified version of TC-Bot by MiuLab with the main difference being that this code does not include NLG or NLU components but just trains the dialogue manager. NL components are not necessary to understand how a GO chatbot is trained with DRL and therefore are not implemented.

Here is a diagram from the paper for TC-Bot, and is similar to the flow of dialogue used in this project other than the LU and NLG components:

In addition to removing NL, there are changes to the success conditions, the DQN agent optimizer and a few other minor changes. Therefore, accuracy should not be compared directly between TC-Bot and this repo.

The database is of movie tickets, the same DB used in TC-Bot. Both the pickle and text versions of the data can be seen in the data directory.

Important Note

A 5-part tutorial series that describes and goes through this code in detail can be found on medium here!

依赖

Python >= 3.5
Keras >= 2.24 (Earlier versions probably work)
numpy

如何运行

constants.json 为相关配置文件

训练 python train.py.

测试 python test.py

All the constants are pretty self explanatory other than "vanilla" under agent which means DQN (true) or Double DQN (false). Defualt is vanilla DQN.

Note: If you get an unpickling error in train or test then run python pickle_converter.py and that should fix it

Test (or Train) with an Actual User

You can test the agent by inputing your own actions as the user (instead of using a user sim) by setting "usersim" under run in constants.json to false. You input an action and a success indicator every step of an episode/conversation in console. The format for the action input is: intent/inform slots/request slots.

Example action inputs:

request/moviename: room, date: friday/starttime, city, theater
inform/moviename: zootopia/
request//starttime
done//

In addition the console will ask for an indicator on whether the agent succeeded yet (other than after the initial action input of an episode). Allowed inputs are -1 for loss, 0 for no outcome yet, 1 for success.

My Data

Used hyperparameters from constants.json.

Table of episodes (every 2000 out of 40000) by max success rate of a period/train frequency (every 100 episodes) up to that episode:

shaoerxixiao / GO-Bot-DRL