eashanadhikarla / seqGAN

Seq-GAN is a unique approach which models the data generator as a stochastic policy in reinforcement learning to solve the problem. The Reinforcement Learning reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SeqGAN: An extended study

SeqGAN adapts GAN for sequential generation. It regards the generator as a policy in reinforcement learning and the discriminator is trained to provide the reward. To evaluate unfinished sequences, Monto-Carlo search is also applied to sample the complete sequences.

This project is implemented by Eashan Adhikarla and reviewed by Prof. Xie Sihong.

Descriptions

This project includes an implementation of SeqGAN and comparison models proposed in the paper [SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient] by Lantao Yu et al. at Shanghai Jiao Tong University and University College London.

Problem Statement

Due to generator differentiation problem, Generative Adversarial Networks have faced serious issues with updating the policy gradient. Seq-GAN is a unique ap- proach which models the data generator as a stochastic policy in reinforcement learning to solve the problem.

Past RNN (Recurrent Neural Network) training methods have shown some good results such as, pro- vided the previously observed token, maximizing the log predictive probability for each true token in the training sequence. However, this solution1 suffers from an exposure bias issue during infer- ence. This discrepancy between training and inference yielded errors that can accumulate quickly along the generated sequence. In this, they suggested scheduled sampling (SS) to address this issue by including some synthetic data during the phrase of learning. Yet later SS has proved to be an incoherent learning technique to be implemented.

Another approach which was used very popularly is to built the loss function of the entire generated sequence instead of each transition. However, this approach did not show up to the mark results for some of the real life complex examples like music, dialog, etc.

The discriminator is well trained in distinguishing between the actual image and the artificial image even in the Generative Adversarial Network (GAN), but GAN performs poorly with the discrete token values since the guidance is too small to cause a change in the restricted dictionary space. The paper suggests using Reinforcement Learning to train the generator portion of GAN.

Requirements

Datasets

  • A randomly initialized LSTM is used to simulate a specific distribution.
  • Obama Speech Dataset.
  • Chinese Poem Dataset.

Related works

Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017.

About

Seq-GAN is a unique approach which models the data generator as a stochastic policy in reinforcement learning to solve the problem. The Reinforcement Learning reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%