Skip to content

Latest commit

 

History

History
114 lines (85 loc) · 3.86 KB

README.md

File metadata and controls

114 lines (85 loc) · 3.86 KB

Seq2Seq models

This is a project to learn to implement different s2s model on tensorflow.

This project is only used for learning, which means it will contain many bugs. I suggest to use nmt project to do experiments and train seq2seq models. You can find it in the reference part.

Experiments

I am experimenting the copynet and pg on lcsts dataset, you can find the code in the lcsts branch.

Issues and suggestions are welcomed.

Models

The models I have implemented are as following:

  • Basic seq2seq model
    • A model with bi-direction RNN encdoer and attention mechanism
  • Seq2seq model
    • Same as basic model, but using tf.data pipeline to process input data
  • GNMT model
    • Residual conection and attention same as GNMT model to speed up training
    • refer to GNMT for more details
  • Pointer-Generator model
  • CopyNet model
    • A model also support copy mechanism
    • refer to CopyNet for more details.

For the implement details, refer to ReadMe in the model folder.

Structure

A typical sequence to sequence(seq2seq) model contains an encoder, an decoder and an attetion structure. Tensorflow provide many useful apis to implement a seq2seq model, usually you will need belowing apis:

  • tf.contrib.rnn
    • Different RNNs
  • tf.contrib.seq2seq
    • Provided different attention mechanism and also a good implementation of beam search
  • tf.data
    • data preproces pipeline apis
  • Other apis you need to build and train a model

Encoder

Use either:

  • Multi-layer rnn
    • use the last state of the last layer rnn as the initial decode state
  • Bi-direction rnn
    • use a Dense layer to convert the fw and bw state to the initial decode state
  • GNMT encoder
    • a bidirection rnn + serveral rnn with residual conection

Decoder

  • Use multi-layer rnn, and set the inital state of each layer to initial decode state
  • GNMT decoder
    • only apply attention to the bottom layer of decoder, so we can utilize multi gpus during training

Attention

  • Bahdanau
  • Luong

Metrics

Right now I only have cross entropy loss. Will add following metrics:

  • bleu
    • for translation problems
  • rouge
    • for summarization problems

Dependency

  • Using tf-1.4
  • Python 3

Run

Run the model on a toy dataset, ie. reverse the sequence

train:

python -m bin.toy_train

inference:

python -m bin.toy_inference

Also you can run on en-vi dataset, refer to en_vietnam_train.py in bin for more details.

You can find more training scripts in bin directory.

Reference

Thanks to following resources: