Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Attention encoder-decoder models for multiple speech-to-text tasks (N…
…VIDIA#8242) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <[email protected]> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <[email protected]> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <[email protected]> * update pc strip Signed-off-by: stevehuang52 <[email protected]> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <[email protected]> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <[email protected]> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <[email protected]> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <[email protected]> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <[email protected]> * Improve type annotations Signed-off-by: Piotr Żelasko <[email protected]> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * fix transcribe config Signed-off-by: stevehuang52 <[email protected]> * Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <[email protected]> * Stabilize inference Signed-off-by: smajumdar <[email protected]> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <[email protected]> * Remove redundant imports Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <[email protected]> * Cleanup Signed-off-by: smajumdar <[email protected]> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <[email protected]> * clean up Signed-off-by: stevehuang52 <[email protected]> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Add typing to beam decoding Signed-off-by: smajumdar <[email protected]> * Make prompt format configurable Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <[email protected]> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <[email protected]> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <[email protected]> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <[email protected]> * training bug fix and adding launch script for speech_multitask (NVIDIA#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <[email protected]> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <[email protected]> --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <[email protected]> * revert to current transcribe API Signed-off-by: stevehuang52 <[email protected]> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <[email protected]> * update eval utils Signed-off-by: stevehuang52 <[email protected]> * update docs Signed-off-by: stevehuang52 <[email protected]> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <[email protected]> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <[email protected]> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <[email protected]> * [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Attempt at refactor... Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * Update config for AED models (NVIDIA#8294) Signed-off-by: smajumdar <[email protected]> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <[email protected]> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <[email protected]> * Document compute_loss Signed-off-by: Piotr Żelasko <[email protected]> * update transcribe_speech.py Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Krishna Puvvada <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: stevehuang52 <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: biscayan <[email protected]>
- Loading branch information