OpenSeq2Seq main goal is to allow researchers to most effectively explore various sequence-to-sequence models. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq is built using TensorFlow and provides all the necessary building blocks for training encoder-decoder models for neural machine translation, automatic speech recognition, speech synthesis, and language modeling.
https://nvidia.github.io/OpenSeq2Seq/
First follow the step from this link to run the container which is required to run the OpenSeq2Seq Toolkit. If you are using VM Instance make sure you are using GPU instance with P100 or V100.
Install requirements:
git clone https://github.com/swapnil3597/OpenSeq2Seq/
cd OpenSeq2Seq
pip install -r requirements.txt
Install CTC decoders:
bash scripts/install_decoders.sh
python scripts/ctc_decoders_test.py
All these above intructions are also available here
Find the links for latest Acoustic model checkpoint and config file from here
To download from drive link follow these commands:
pip3 install gdown
gdown https://drive.google.com/uc?id=12CQvNrTvf0cjTsKjbaWWvdaZb7RxWI6X&export=download # This is an example, use the latest drive link for Jasper checkpoint
To download the Language model follow these steps:
bash scripts/install_kenlm.sh
bash scripts/download_lm.sh
After running this command a language_model/
dir would be created containing the binary file for 4-gram ARPA language model.
First in run_inference.sh
script make sure you provide the correct path for --config
and --logdir
for Acoustic model (Jasper).
There are two ways to run inference:
1. With Greedy Decoder:
Make sure that in config file "decoder_params"
section has 'infer_logits_to_pickle': False
line and that "dataset_files"
field of "infer_params"
section contains a target CSV file. Then run:
bash run_inference.sh # You will get desired output in model_output.pickle file
1. With Language Model Rescoring:
In the file run_decoding.sh
provide the correct binary file path for language model in --lm
and
make sure that in config file "decoder_params"
section has 'infer_logits_to_pickle': True
line and that "dataset_files"
field of "infer_params"
section contains a target CSV file. Then run:
bash run_inference.sh # You will get acoustic model logits in model_output.pickle file
# To decode the logits run:
bash run_decoding.sh
# For --mode as 'infer' you will get output in --infer_output_file 'inference_output_lm.csv'
- Models for:
- Neural Machine Translation
- Automatic Speech Recognition
- Speech Synthesis
- Language Modeling
- NLP tasks (sentiment analysis)
- Data-parallel distributed training
- Multi-GPU
- Multi-node
- Mixed precision training for NVIDIA Volta/Turing GPUs
- Python >= 3.5
- TensorFlow >= 1.10
- CUDA >= 9.0, cuDNN >= 7.0
- Horovod >= 0.13 (using Horovod is not required, but is highly recommended for multi-GPU setup)
Speech-to-text workflow uses some parts of Mozilla DeepSpeech project.
Beam search decoder with language model re-scoring implementation (in decoders
) is based on Baidu DeepSpeech.
Text-to-text workflow uses some functions from Tensor2Tensor and Neural Machine Translation (seq2seq) Tutorial.
This is a research project, not an official NVIDIA product.
- Tensor2Tensor
- Neural Machine Translation (seq2seq) Tutorial
- OpenNMT
- Neural Monkey
- Sockeye
- TF-seq2seq
- Moses
If you use OpenSeq2Seq, please cite this paper
@misc{openseq2seq,
title={Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq},
author={Oleksii Kuchaiev and Boris Ginsburg and Igor Gitman and Vitaly Lavrukhin and Jason Li and Huyen Nguyen and Carl Case and Paulius Micikevicius},
year={2018},
eprint={1805.10387},
archivePrefix={arXiv},
primaryClass={cs.CL}
}