See Long Text Generation via Adversarial Training with Leaked Information.
Text generation using Generative Adversarial Networks (GAN) and Hierarchical Reinforcement Learning.
There is a problem in using straight GANs for text generation:
- GANs work by propagating gradients through the composition of Generator and Discriminator
- Text is normally generated by having a final softmax layer over the token space, i.e. the output of the network is normally the probabilities of generating each token (a discrete stochastic unit).
These two things do not work well together on their own because you cannot propagate gradients through discrete stochastic units.
There are two main approaches to deal with this:
- REINFORCE algorithm (REINFORCE is known to have high variance so you need a large amount of data to get good gradient estimations)
- Gumbel-Softmax re-parameterization (also known as the Concrete distribution).
As an example of REINFORCE for textual GANs you can read the SeqGAN paper. For an example of Gumbel-Softmax you can read this article.
Another option is not having a discrete stochastic unit as output of the generator, e.g. generating tokens deterministically in embedded space. Hence eliminating the original problem of back-propagating through them. See Generating sentences from a continuous space for hints on producing human-readable text from a latent sentence space, i.e. a vector of continuous values.
The approach taken by this code uses a Policy Gradient algorithm combined with a GAN that uses a discriminative model to guide the training of the generative model as a reinforcement learning policy, which has shown promising results in text generation. Normally, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words).
To address the problem for long text generation, we allow the discriminative network to leak its own high-level extracted features to the generative network to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional Manager module, which takes the extracted features of current generated words and outputs a latent vector to guide the Worker module for next-word generation.
An hierarchical generator G consists of a high-level Manager module and a low-level Worker module. The Manager is a long short term memory network (LSTM) and serves as a mediator. In each step, it receives generator D’s high-level feature representation, e.g., the feature map of the CNN, and uses it to form the guiding goal for the Worker module in that timestep. As the information from D is internally-maintained and in an adversarial game it is not supposed to provide G with such information. We thus call it a leakage of information from D (LeakGAN).
Next, given the goal embedding produced by the Manager, the Worker first encodes current generated words with another LSTM, then combines the output of the LSTM and the goal embedding to take a final action at current state. As such, the guiding signals from D are not only available to G at the end in terms of the scalar reward signals, but also available in terms of a goal embedding vector during the generation process to guide G in how to improve.