From 7755c173fac79b6494b686b9781d7ebac9f22054 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Date: Wed, 11 Oct 2023 16:23:41 -0700 Subject: [PATCH] Update docs: readme, getting started, ASR intro (#7679) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva * tidy up links Signed-off-by: Elena Rastorgueva * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi * transpose conv1d inputs Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi * add collection classes Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * clean references Signed-off-by: mburchi * freeze unfreeze transcribe cv models Signed-off-by: mburchi * correct manifest get_full_path bug Signed-off-by: mburchi * update for PR Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi * add self.out = None to asr subsampling Signed-off-by: mburchi * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman Signed-off-by: Elena Rastorgueva * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek * StarCoder conversion test Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek * Fix test Signed-off-by: Jan Lasek * Catch up with save_to changes Signed-off-by: Jan Lasek * Don't abbreviate args for clarity Signed-off-by: Jan Lasek * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu Co-authored-by: Hongbin Liu Signed-off-by: Elena Rastorgueva * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Elena Rastorgueva * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Elena Rastorgueva * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman Signed-off-by: Elena Rastorgueva * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan * [TTS] Add comment about read failures Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman Signed-off-by: Elena Rastorgueva * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Elena Rastorgueva * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Elena Rastorgueva * fix (#7511) Signed-off-by: Abhinav Khattar Signed-off-by: Elena Rastorgueva * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria * add test Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu * mark autogenrated and remove it for test Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan Co-authored-by: Kunal Dhawan Co-authored-by: Nithin Rao Signed-off-by: Elena Rastorgueva * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Elena Rastorgueva * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu * list of fields for context Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Fix bug Signed-off-by: Cheng-Ping Hsieh * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh * Remove unused import Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Change assert logic Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh * Remove random truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui * code style warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui * add copyright header Signed-off-by: Chen Cui * fix code check warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * consolidate peft and sft scripts Signed-off-by: Chen Cui * update CI tests Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui * support pre-extracted checkpoints Signed-off-by: Chen Cui --------- Signed-off-by: jasonwan Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Co-authored-by: Chen Cui Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn Co-authored-by: jasonwan Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix a typo (#7496) Signed-off-by: BestJuly Signed-off-by: Elena Rastorgueva * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Elena Rastorgueva * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Elena Rastorgueva * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan Signed-off-by: smajumdar Co-authored-by: Kunal Dhawan Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Elena Rastorgueva * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Elena Rastorgueva * Fix typos (#7581) Signed-off-by: Elena Rastorgueva * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Elena Rastorgueva * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek * Test with pyt:23.09 container Signed-off-by: Jan Lasek --------- Signed-off-by: Jan Lasek Signed-off-by: Elena Rastorgueva * defaults changed (#7600) * defaults changed Signed-off-by: arendu * typo Signed-off-by: arendu * update Signed-off-by: arendu --------- Signed-off-by: arendu Signed-off-by: Elena Rastorgueva * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree Signed-off-by: Elena Rastorgueva * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui * Update peft_config.py brackets Signed-off-by: Adi Renduchintala --------- Signed-off-by: Chen Cui Signed-off-by: Adi Renduchintala Co-authored-by: Adi Renduchintala Signed-off-by: Elena Rastorgueva * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon Signed-off-by: Elena Rastorgueva * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Koluguri Signed-off-by: Elena Rastorgueva * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Co-authored-by: Eric Harper Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar * Guard MeCab and Ipadic Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix remaining ASR dataclasses Signed-off-by: smajumdar * Fix scripts Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Signed-off-by: Sangkug Lym Signed-off-by: Jason Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym Co-authored-by: Jason Signed-off-by: Elena Rastorgueva * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Elena Rastorgueva * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan Co-authored-by: Ryan Langman Signed-off-by: Elena Rastorgueva * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan * [TTS] Fix STFT resolution Signed-off-by: Ryan * [TTS] Fix training metric logging Signed-off-by: Ryan * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover * remove copy from other models Signed-off-by: Maanu Grover * modify attribute not arg Signed-off-by: Maanu Grover * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover * rename function and add docstring Signed-off-by: Maanu Grover * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover * unnecessary function and cfg reset Signed-off-by: Maanu Grover * set default value Signed-off-by: Maanu Grover * fix precision lookup in a few more places Signed-off-by: Maanu Grover * rename mapping function Signed-off-by: Maanu Grover * ununsed import Signed-off-by: Maanu Grover * save torch datatype to model Signed-off-by: Maanu Grover * set weights precision wrt amp o2 Signed-off-by: Maanu Grover * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover * revert half precision at inference attempt Signed-off-by: Maanu Grover * move autocast dtype to base model Signed-off-by: Maanu Grover * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover * unused imports Signed-off-by: Maanu Grover --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon * Fix typo Signed-off-by: Tim Moon * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon * Add distopt logic for int dtypes Signed-off-by: Tim Moon * Update Apex commit Signed-off-by: Tim Moon * Remove unused variables Signed-off-by: Tim Moon * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon --------- Signed-off-by: Tim Moon Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang * Update Jenkinsfile Signed-off-by: Jason Wang * remove fast_swiglu configuration Signed-off-by: Jason Wang --------- Signed-off-by: Jason Wang Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić Signed-off-by: Sasha Meister * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar * update commit Signed-off-by: Abhinav Khattar --------- Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover * move precision copy before super constructor Signed-off-by: Maanu Grover * use trainer arg Signed-off-by: Maanu Grover --------- Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar * Fix issue with missing tokenizer Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * Refactor Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper * move dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper * when using mcore we can change tp pp on the fly Signed-off-by: eharper * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper * fix load dist ckpt Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper * remove import Signed-off-by: eharper --------- Signed-off-by: eharper Signed-off-by: jasonwan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan Signed-off-by: Sasha Meister * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang Co-authored-by: Jimmy Zhang Signed-off-by: Sasha Meister * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * make loss mask default to false (#7407) Signed-off-by: eharper Signed-off-by: Sasha Meister * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym Signed-off-by: Sasha Meister * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd Signed-off-by: Sasha Meister * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov Signed-off-by: Sasha Meister * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov Signed-off-by: Sasha Meister * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong Signed-off-by: Sasha Meister * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell Signed-off-by: Sasha Meister * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang Signed-off-by: Sasha Meister * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev * tests added Signed-off-by: Aleksandr Laptev --------- Signed-off-by: Aleksandr Laptev Signed-off-by: Sasha Meister * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Sasha Meister * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong * Update tacotron2.py Signed-off-by: Jason --------- Signed-off-by: Robin Dong Signed-off-by: Jason Co-authored-by: Jason Signed-off-by: Sasha Meister * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan * [TTS] Fix audio codec tests Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi * transpose conv1d inputs Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi * add collection classes Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * correct manifest bug when having only audio or only videos Signed-off-by: mburchi * clean references Signed-off-by: mburchi * freeze unfreeze transcribe cv models Signed-off-by: mburchi * correct manifest get_full_path bug Signed-off-by: mburchi * update for PR Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi * add self.out = None to asr subsampling Signed-off-by: mburchi * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman Signed-off-by: Sasha Meister * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek * StarCoder conversion test Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek * Fix test Signed-off-by: Jan Lasek * Catch up with save_to changes Signed-off-by: Jan Lasek * Don't abbreviate args for clarity Signed-off-by: Jan Lasek * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu Co-authored-by: Hongbin Liu Signed-off-by: Sasha Meister * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Sasha Meister * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan * [TTS] Add comment about read failures Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: Sasha Meister * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman Signed-off-by: Sasha Meister * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang Co-authored-by: Jocelyn Signed-off-by: Sasha Meister * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * fix (#7511) Signed-off-by: Abhinav Khattar Signed-off-by: Sasha Meister * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan Signed-off-by: Sasha Meister * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria * add test Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Create punctuation_rates.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Format fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister * fix typo Signed-off-by: Sasha Meister * added function for import and call, docstrings Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu * mark autogenrated and remove it for test Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan Co-authored-by: Kunal Dhawan Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Sasha Meister * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu * list of fields for context Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Fix bug Signed-off-by: Cheng-Ping Hsieh * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh * Remove unused import Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * Change assert logic Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh * Remove random truncation Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh --------- Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui * code style warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui * add copyright header Signed-off-by: Chen Cui * fix code check warnings Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * update deprecation notices Signed-off-by: Chen Cui * consolidate peft and sft scripts Signed-off-by: Chen Cui * update CI tests Signed-off-by: Chen Cui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui * support pre-extracted checkpoints Signed-off-by: Chen Cui --------- Signed-off-by: jasonwan Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Co-authored-by: Chen Cui Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn Co-authored-by: jasonwan Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * fix a typo (#7496) Signed-off-by: BestJuly Signed-off-by: Sasha Meister * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong --------- Signed-off-by: Robin Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan Signed-off-by: smajumdar Co-authored-by: Kunal Dhawan Co-authored-by: Somshubra Majumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * Fix typos (#7581) Signed-off-by: Sasha Meister * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * added per tests Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Signed-off-by: Sasha Meister * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek * Test with pyt:23.09 container Signed-off-by: Jan Lasek --------- Signed-off-by: Jan Lasek Signed-off-by: Sasha Meister * defaults changed (#7600) * defaults changed Signed-off-by: arendu * typo Signed-off-by: arendu * update Signed-off-by: arendu --------- Signed-off-by: arendu Signed-off-by: Sasha Meister * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria --------- Signed-off-by: GiacomoLeoneMaria Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Fix tests Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason --------- Signed-off-by: Jason Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister * Function name fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * guard extra dependencies Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * scores_per_sample description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix import guards Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree Signed-off-by: Sasha Meister * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui * Update peft_config.py brackets Signed-off-by: Adi Renduchintala --------- Signed-off-by: Chen Cui Signed-off-by: Adi Renduchintala Co-authored-by: Adi Renduchintala Signed-off-by: Sasha Meister * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon Signed-off-by: Sasha Meister * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * metric and variables name fixing Signed-off-by: Sasha Meister * Add else samples = None Signed-off-by: Sasha Meister * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Koluguri Signed-off-by: Sasha Meister * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri Co-authored-by: Nithin Rao Co-authored-by: Eric Harper Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signe… * add more info and links to asr intro.rst Signed-off-by: Elena Rastorgueva * conversion issue fix (#7648) (#7668) Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * layernorm1p fix (#7523) (#7567) * layernorm1p fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add layernorm1p to if statement * config changes * gpt config changes * remove layernorm_zero_centered_gamma from gpt config * change line --------- Signed-off-by: dimapihtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * generalized chat sft prompt (#7655) * fix dataset issues Signed-off-by: Yi Dong * working version Signed-off-by: Yi Dong * all passed Signed-off-by: Yi Dong * refactor tests Signed-off-by: Yi Dong * all pass Signed-off-by: Yi Dong * working version Signed-off-by: Yi Dong * use end name signal for labels Signed-off-by: Yi Dong * all fixed Signed-off-by: Yi Dong * update doc Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * remove unused imports Signed-off-by: Yi Dong * make sure nccl not timing out Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * generate example template Signed-off-by: Yi Dong * generic end of name token Signed-off-by: Yi Dong * style fix Signed-off-by: Yi Dong * add the chat prompt format into the config Signed-off-by: Yi Dong * make sure sft working Signed-off-by: Yi Dong * address reviewer comment Signed-off-by: Yi Dong * fix non Signed-off-by: Yi Dong * try openAI prompt Signed-off-by: Yi Dong * remove unused imports Signed-off-by: Yi Dong * remove human labels from the data Signed-off-by: Yi Dong * use hf dataset to clean Signed-off-by: Yi Dong * reviewer comments Signed-off-by: Yi Dong --------- Signed-off-by: Yi Dong Signed-off-by: Elena Rastorgueva * Fix vad & speech command tutorial - onnx (#7671) (#7672) * fix vad onnx * fix mbn onnx --------- Signed-off-by: fayejf Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * put installation info back in README Signed-off-by: Elena Rastorgueva * remove installation info from docs, point to info in readme Signed-off-by: Elena Rastorgueva * tidy up readme a bit Signed-off-by: Elena Rastorgueva * tidy up starthere intro, update code Signed-off-by: Elena Rastorgueva * correct timestamps code snippet, other updates Signed-off-by: Elena Rastorgueva * remove old link placeholder Signed-off-by: Elena Rastorgueva * Fix in the confidence ensemble test (#7682) * Fix in the confidence ensemble test Signed-off-by: Igor Gitman * Correct parameter names Signed-off-by: Igor Gitman --------- Signed-off-by: Igor Gitman Signed-off-by: Elena Rastorgueva * remove example command to install pytorch Signed-off-by: Elena Rastorgueva * put back links for pretrained models and discussion board Signed-off-by: Elena Rastorgueva * change publications link to point to blog page Signed-off-by: Elena Rastorgueva * fix link formatting Signed-off-by: Elena Rastorgueva * PEFT eval fix (#7626) (#7638) * fix issue where peft weights are not loaded for distributed checkpoints * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui Co-authored-by: Chen Cui Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * fix publications link formatting Signed-off-by: Elena Rastorgueva * propagate mp config (#7637) (#7639) Signed-off-by: eharper Co-authored-by: Eric Harper Signed-off-by: Elena Rastorgueva * Add find_unused_parameters_true for text_classiftn and punctuation_capitalization (#7649) (#7657) Signed-off-by: Abhishree Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż Co-authored-by: mikolajblaz Signed-off-by: Elena Rastorgueva * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym Co-authored-by: Sangkug Lym Signed-off-by: Elena Rastorgueva * more natural phrasing Signed-off-by: Elena Rastorgueva * update gh-pages-src link to point to readme Signed-off-by: Elena Rastorgueva * remove PUBLICATIONS.md and update README Signed-off-by: Elena Rastorgueva * Update README.rst Remove old video, move PyData video to tutorials section Signed-off-by: Elena Rastorgueva * specify optional mac dependencies, add space in comment Signed-off-by: Elena Rastorgueva * update copyright year Signed-off-by: Elena Rastorgueva --------- Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva Signed-off-by: Cheng-Ping Hsieh Signed-off-by: Hongbin Liu Signed-off-by: Tamerlan Tabolov Signed-off-by: smajumdar Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Stas Bekman Signed-off-by: Robin Dong Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jocelyn Huang Signed-off-by: Abhinav Khattar Signed-off-by: Abhishree Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Nithin Rao Koluguri Signed-off-by: jasonwan Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala Signed-off-by: arendu Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui Signed-off-by: BestJuly Signed-off-by: Jan Lasek Signed-off-by: arendu Signed-off-by: GiacomoLeoneMaria Signed-off-by: dimapihtar Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: Jason Signed-off-by: Mehadi Hasan Menon Signed-off-by: Ante Jukić Signed-off-by: Maanu Grover Signed-off-by: Sasha Meister Signed-off-by: Jason Wang Signed-off-by: Tim Moon Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: eharper Signed-off-by: Jimmy Zhang Signed-off-by: Sangkug Lym Signed-off-by: George Zelenfroynd Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Anton Peganov Signed-off-by: Nikolay Karpov Signed-off-by: Samuele Cornell Signed-off-by: KunalDhawan Signed-off-by: Aleksandr Laptev Signed-off-by: mburchi Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Signed-off-by: Gerald Shen Signed-off-by: Yi Dong Signed-off-by: Igor Gitman Signed-off-by: Jan Baczek Signed-off-by: Mikołaj Błaż Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Ryan Langman Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: Jan Lasek Co-authored-by: Kelvin Liu Co-authored-by: Hongbin Liu Co-authored-by: Tamerlan Tabolov Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Stas Bekman Co-authored-by: Robin Dong Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Jocelyn Co-authored-by: Abhinav Khattar Co-authored-by: Giacomo Leone Maria Cavallini <72698188+GiacomoLeoneMaria@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Co-authored-by: Nithin Rao Co-authored-by: meatybobby Co-authored-by: Chen Cui Co-authored-by: Marc Romeyn Co-authored-by: jasonwan Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Yuanzhe Dong Co-authored-by: Cheng-Ping Hsieh Co-authored-by: Li Tao Co-authored-by: Igor Gitman Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Jason Co-authored-by: Mehadi Hasan Menon Co-authored-by: Eric Harper Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Aleksandr Laptev Co-authored-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com> Co-authored-by: Jimmy Zhang Co-authored-by: Sangkug Lym Co-authored-by: PeganovAnton Co-authored-by: Nikolay Karpov Co-authored-by: Samuele Cornell Co-authored-by: Yang Zhang Co-authored-by: Kunal Dhawan Co-authored-by: Igor Gitman Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com> Co-authored-by: mikolajblaz --- README.rst | 71 +++++--- docs/source/asr/asr_language_modeling.rst | 1 + docs/source/asr/datasets.rst | 2 + docs/source/asr/intro.rst | 189 ++++++++++++++++++--- docs/source/asr/resources.rst | 17 -- docs/source/asr/results.rst | 3 + docs/source/conf.py | 2 +- docs/source/starthere/intro.rst | 193 ++++++++-------------- 8 files changed, 283 insertions(+), 195 deletions(-) delete mode 100644 docs/source/asr/resources.rst diff --git a/README.rst b/README.rst index acc30a68e610..d05a9b63c642 100644 --- a/README.rst +++ b/README.rst @@ -58,7 +58,7 @@ State of the Art pretrained NeMo models are freely available on `HuggingFace Hub These models can be used to transcribe audio, synthesize speech, or translate text in just a few lines of code. We have extensive `tutorials `_ that -can all be run on `Google Colab `_. +can be run on `Google Colab `_. For advanced users that want to train NeMo models from scratch or finetune existing NeMo models we have a full suite of `example scripts `_ that support multi-GPU/multi-node training. @@ -67,30 +67,14 @@ For scaling NeMo LLM training on Slurm clusters or public clouds, please see the The NM launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and also has an `Autoconfigurator `_ which can be used to find the optimal model parallel configuration for training on a specific cluster. -Also see the two introductory videos below for a high level overview of NeMo. - -* Developing State-Of-The-Art Conversational AI Models in Three Lines of Code. -* NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022. - -|three_lines| |pydata| - -.. |pydata| image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg - :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 - :width: 600 - :alt: Develop Conversational AI Models in 3 Lines - -.. |three_lines| image:: https://img.youtube.com/vi/wBgpMf_KQVw/maxresdefault.jpg - :target: https://www.youtube.com/embed/wBgpMf_KQVw?mute=0&start=0&autoplay=0 - :width: 600 - :alt: Introduction at PyData@Yerevan 2022 - Key Features ------------ * Speech processing * `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) `_ + * `Pretrained models `_ available in 14+ languages * `Automatic Speech Recognition (ASR) `_ - * Supported ASR models: ``_ + * Supported ASR `models `_: * Jasper, QuartzNet, CitriNet, ContextNet * Conformer-CTC, Conformer-Transducer, FastConformer-CTC, FastConformer-Transducer * Squeezeformer-CTC and Squeezeformer-Transducer @@ -101,7 +85,7 @@ Key Features * Hybrid Transducer/CTC * NeMo Original `Multi-blank Transducers `_ and `Token-and-Duration Transducers (TDT) `_ * Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples `_ - * Cache-aware Streaming Conformer with multiple lookaheads - ``_ + * `Cache-aware Streaming Conformer `_ with multiple lookaheads. * Beam Search decoding * `Language Modelling for ASR (CTC and RNNT) `_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer * `Support of long audios for Conformer with memory efficient local attention `_ @@ -113,8 +97,6 @@ Key Features * Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet * Neural Diarizer: MSDD (Multi-scale Diarization Decoder) * `Speech Intent Detection and Slot Filling `_: Conformer-Transformer - * `Pretrained models on different languages. `_: English, Spanish, German, Russian, Chinese, French, Italian, Polish, ... - * `NGC collection of pre-trained speech processing models. `_ * Natural Language Processing * `NeMo Megatron pre-training of Large Language Models `_ * `Neural Machine Translation (NMT) `_ @@ -151,7 +133,7 @@ Requirements 1) Python 3.10 or above 2) Pytorch 1.13.1 or above -3) NVIDIA GPU for training +3) NVIDIA GPU, if you intend to do model training Documentation ------------- @@ -178,6 +160,15 @@ Tutorials --------- A great way to start with NeMo is by checking `one of our tutorials `_. +You can also get a high-level overview of NeMo by watching the talk *NVIDIA NeMo: Toolkit for Conversational AI*, presented at PyData Yerevan 2022: + +|pydata| + +.. |pydata| image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg + :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 + :width: 600 + :alt: NeMo presentation at PyData@Yerevan 2022 + Getting help with NeMo ---------------------- FAQ can be found on NeMo's `Discussions board `_. You are welcome to ask questions or start discussions there. @@ -185,7 +176,6 @@ FAQ can be found on NeMo's `Discussions board `_ CONTRIBUTING.md for the process. +We welcome community contributions! Please refer to `CONTRIBUTING.md `_ for the process. Publications ------------ @@ -367,4 +384,4 @@ Please refer to the instructions in the `README of that branch `_. +NeMo is released under an `Apache 2.0 license `_. diff --git a/docs/source/asr/asr_language_modeling.rst b/docs/source/asr/asr_language_modeling.rst index a0d46ca795b1..bb823cb252c0 100644 --- a/docs/source/asr/asr_language_modeling.rst +++ b/docs/source/asr/asr_language_modeling.rst @@ -39,6 +39,7 @@ penalty term to consider the sequence length in the scores. Larger alpha means m importance on the acoustic model. Negative values for beta will give penalty to longer sequences and make the decoder to prefer shorter predictions, while positive values would result in longer candidates. +.. _train-ngram-lm: Train N-gram LM =============== diff --git a/docs/source/asr/datasets.rst b/docs/source/asr/datasets.rst index 05278ecb2437..61c34014c809 100644 --- a/docs/source/asr/datasets.rst +++ b/docs/source/asr/datasets.rst @@ -162,6 +162,8 @@ these files using ``--dest_folder``. In order to generate files in the supported After the script finishes, the ``train.json``, ``dev.json``, ``test.json``, and ``vocab.txt`` files can be found in the ``dest_folder`` directory. +.. _section-with-manifest-format-explanation: + Preparing Custom ASR Data ------------------------- diff --git a/docs/source/asr/intro.rst b/docs/source/asr/intro.rst index 46a192c546a2..7066c2989393 100644 --- a/docs/source/asr/intro.rst +++ b/docs/source/asr/intro.rst @@ -1,34 +1,156 @@ Automatic Speech Recognition (ASR) ================================== -ASR, or Automatic Speech Recognition, refers to the problem of getting a program to automatically transcribe spoken language -(speech-to-text). Our goal is usually to have a model that minimizes the Word Error Rate (WER) metric when transcribing speech input. -In other words, given some audio file (e.g. a WAV file) containing speech, how do we transform this into the corresponding text with -as few errors as possible? +Automatic Speech Recognition (ASR), also known as Speech To Text (STT), refers to the problem of automatically transcribing spoken language. +You can use NeMo to transcribe speech using open-sourced pretrained models in :ref:`14+ languages `, or :doc:`train your own<./examples/kinyarwanda_asr>` ASR models. -Traditional speech recognition takes a generative approach, modeling the full pipeline of how speech sounds are produced in order to -evaluate a speech sample. We would start from a language model that encapsulates the most likely orderings of words that are generated -(e.g. an n-gram model), to a pronunciation model for each word in that ordering (e.g. a pronunciation table), to an acoustic model that -translates those pronunciations to audio waveforms (e.g. a Gaussian Mixture Model). -Then, if we receive some spoken input, our goal would be to find the most likely sequence of text that would result in the given audio -according to our generative pipeline of models. Overall, with traditional speech recognition, we try to model ``Pr(audio|transcript)*Pr(transcript)``, -and take the argmax of this over possible transcripts. -Over time, neural nets advanced to the point where each component of the traditional speech recognition model could be replaced by a -neural model that had better performance and that had a greater potential for generalization. For example, we could replace an n-gram -model with a neural language model, and replace a pronunciation table with a neural pronunciation model, and so on. However, each of -these neural models need to be trained individually on different tasks, and errors in any model in the pipeline could throw off the -whole prediction. +Transcribe speech with 3 lines of code +---------------------------------------- +After :ref:`installing NeMo`, you can transcribe an audio file as follows: -Thus, we can see the appeal of end-to-end ASR architectures: discriminative models that simply take an audio input and give a textual -output, and in which all components of the architecture are trained together towards the same goal. The model's encoder would be -akin to an acoustic model for extracting speech features, which can then be directly piped to a decoder which outputs text. If desired, -we could integrate a language model that would improve our predictions, as well. +.. code-block:: python -And the entire end-to-end ASR model can be trained at once--a much easier pipeline to handle! + import nemo.collections.asr as nemo_asr + asr_model = nemo_asr.models.ASRModel.from_pretrained("stt_en_fastconformer_transducer_large") + transcript = asr_model.transcribe(["path/to/audio_file.wav"]) -A demo below allows evaluation of NeMo ASR models in multiple langauges from the browser: +Obtain word timestamps +^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can also obtain timestamps for each word in the transcription as follows: + +.. code-block:: python + + # import nemo_asr and instantiate asr_model as above + import nemo.collections.asr as nemo_asr + asr_model = nemo_asr.models.ASRModel.from_pretrained("stt_en_fastconformer_transducer_large") + + # update decoding config to preserve alignments and compute timestamps + from omegaconf import OmegaConf, open_dict + decoding_cfg = asr_model.cfg.decoding + with open_dict(decoding_cfg): + decoding_cfg.preserve_alignments = True + decoding_cfg.compute_timestamps = True + asr_model.change_decoding_strategy(decoding_cfg) + + # specify flag `return_hypotheses=True`` + hypotheses = asr_model.transcribe(["path/to/audio_file.wav"], return_hypotheses=True) + + # if hypotheses form a tuple (from RNNT), extract just "best" hypotheses + if type(hypotheses) == tuple and len(hypotheses) == 2: + hypotheses = hypotheses[0] + + timestamp_dict = hypotheses[0].timestep # extract timesteps from hypothesis of first (and only) audio file + print("Hypothesis contains following timestep information :", list(timestamp_dict.keys())) + + # For a FastConformer model, you can display the word timestamps as follows: + # 80ms is duration of a timestep at output of the Conformer + time_stride = 8 * asr_model.cfg.preprocessor.window_stride + + word_timestamps = timestamp_dict['word'] + + for stamp in word_timestamps: + start = stamp['start_offset'] * time_stride + end = stamp['end_offset'] * time_stride + word = stamp['char'] if 'char' in stamp else stamp['word'] + + print(f"Time : {start:0.2f} - {end:0.2f} - {word}") + +Transcribe speech via command line +---------------------------------- +You can also transcribe speech via the command line using the following `script `_, for example: + +.. code-block:: bash + + python /blob/main/examples/asr/transcribe_speech.py \ + pretrained_name="stt_en_fastconformer_transducer_large" \ + audio_dir= # path to dir containing audio files to transcribe + +The script will save all transcriptions in a JSONL file where each line corresponds to an audio file in ````. +This file will correspond to a format that NeMo commonly uses for saving model predictions, and also for storing +input data for training and evaluation. You can learn more about the format that NeMo uses for these files +(which we refer to as "manifest files") :ref:`here`. + +You can also specify the files to be transcribed inside a manifest file, and pass that in using the argument +``dataset_manifest=`` instead of ``audio_dir``. + + +Incorporate a language model (LM) to improve ASR transcriptions +--------------------------------------------------------------- + +You can often get a boost to transcription accuracy by using a Language Model to help choose words that are more likely +to be spoken in a sentence. + +You can get a good improvement in transcription accuracy even using a simple N-gram LM. + +After :ref:`training ` an N-gram LM, you can use it for transcribing audio as follows: + +1. Install the OpenSeq2Seq beam search decoding and KenLM libraries using `this script `_. +2. Perform transcription using `this script `_: + +.. code-block:: bash + + python eval_beamsearch_ngram.py nemo_model_file= \ + input_manifest= \ + beam_width=[] \ + beam_alpha=[] \ + beam_beta=[] \ + preds_output_folder= \ + probs_cache_file=null \ + decoding_mode=beamsearch_ngram \ + decoding_strategy="" + +See more information about LM decoding :doc:`here <./asr_language_modeling>`. + +Use real-time transcription +--------------------------- + +It is possible to use NeMo to transcribe speech in real-time. You can find an example of how to do +this in the following `notebook tutorial `_. + + +Try different ASR models +------------------------ + +NeMo offers a variety of open-sourced pretrained ASR models that vary by model architecture: + +* **encoder architecture** (FastConformer, Conformer, Citrinet, etc.), +* **decoder architecture** (Transducer, CTC & hybrid of the two), +* **size** of the model (small, medium, large, etc.). + +The pretrained models also vary by: + +* **language** (English, Spanish, etc., including some **multilingual** and **code-switching** models), +* whether the output text contains **punctuation & capitalization** or not. + +The NeMo ASR checkpoints can be found on `HuggingFace `_, or on `NGC `_. All models released by the NeMo team can be found on NGC, and some of those are also available on HuggingFace. + +All NeMo ASR checkpoints open-sourced by the NeMo team follow the following naming convention: +``stt_{language}_{encoder name}_{decoder name}_{model size}{_optional descriptor}``. + +You can load the checkpoints automatically using the ``ASRModel.from_pretrained()`` class method, for example: + +.. code-block:: python + + import nemo.collections.asr as nemo_asr + # model will be fetched from NGC + asr_model = nemo_asr.models.ASRModel.from_pretrained("stt_en_fastconformer_transducer_large") + # if model name is prepended with "nvidia/", the model will be fetched from huggingface + asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/stt_en_fastconformer_transducer_large") + # you can also load open-sourced NeMo models released by other HF users using: + # asr_model = nemo_asr.models.ASRModel.from_pretrained("/") + +See further documentation about :doc:`loading checkpoints <./results>`, a full :ref:`list ` of models and their :doc:`benchmark scores <./score>`. + +There is also more information about the ASR model architectures available in NeMo :doc:`here <./models>`. + + +Try out NeMo ASR transcription in your browser +---------------------------------------------- +You can try out transcription with NeMo ASR models without leaving your browser, by using the HuggingFace Space embedded below. .. raw:: html @@ -40,10 +162,27 @@ A demo below allows evaluation of NeMo ASR models in multiple langauges from the -The full documentation tree is as follows: +ASR tutorial notebooks +---------------------- +Hands-on speech recognition tutorial notebooks can be found under `the ASR tutorials folder `_. +If you are a beginner to NeMo, consider trying out the `ASR with NeMo `_ tutorial. +This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab. + +ASR model configuration +----------------------- +Documentation regarding the configuration files specific to the ``nemo_asr`` models can be found in the :doc:`Configuration Files <./configs>` section. + +Preparing ASR datasets +---------------------- +NeMo includes preprocessing scripts for several common ASR datasets. The :doc:`Datasets <./datasets>` section contains instructions on +running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data. + +Further information +------------------- +For more information, see additional sections in the ASR docs on the left-hand-side menu or in the list below: .. toctree:: - :maxdepth: 8 + :maxdepth: 1 models datasets @@ -54,5 +193,3 @@ The full documentation tree is as follows: api resources examples/kinyarwanda_asr.rst - -.. include:: resources.rst diff --git a/docs/source/asr/resources.rst b/docs/source/asr/resources.rst deleted file mode 100644 index e192f5fbe83d..000000000000 --- a/docs/source/asr/resources.rst +++ /dev/null @@ -1,17 +0,0 @@ -Resources and Documentation ---------------------------- - -Hands-on speech recognition tutorial notebooks can be found under `the ASR tutorials folder `_. -If you are a beginner to NeMo, consider trying out the `ASR with NeMo `_ tutorial. -This and most other tutorials can be run on Google Colab by specifying the link to the notebooks' GitHub pages on Colab. - -If you are looking for information about a particular ASR model, or would like to find out more about the model -architectures available in the `nemo_asr` collection, refer to the :doc:`Models <./models>` section. - -NeMo includes preprocessing scripts for several common ASR datasets. The :doc:`Datasets <./datasets>` section contains instructions on -running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data. - -Information about how to load model checkpoints (either local files or pretrained ones from NGC), as well as a list of the checkpoints -available on NGC are located on the :doc:`Checkpoints <./results>` section. - -Documentation regarding the configuration files specific to the ``nemo_asr`` models can be found on the :doc:`Configuration Files <./configs>` section. diff --git a/docs/source/asr/results.rst b/docs/source/asr/results.rst index 466393e9a55a..3dca110d89e4 100644 --- a/docs/source/asr/results.rst +++ b/docs/source/asr/results.rst @@ -157,6 +157,9 @@ Language Models for ASR | + +.. _asr-checkpoint-list-by-language: + Speech Recognition (Languages) ------------------------------ diff --git a/docs/source/conf.py b/docs/source/conf.py index c54defb59ce8..952e25332ca4 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -149,7 +149,7 @@ # General information about the project. project = "NVIDIA NeMo" -copyright = "© 2021-2022 NVIDIA Corporation & Affiliates. All rights reserved." +copyright = "© 2021-2023 NVIDIA Corporation & Affiliates. All rights reserved." author = "NVIDIA CORPORATION" # The version info for the project you're documenting, acts as replacement for diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index 9297b7ef53b3..e6a59b0832ab 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -10,7 +10,7 @@ Introduction `NVIDIA NeMo `_, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), -Natural Language Processing (NLP), and Text-to-Speech (TTS) synthesis models. Each collection consists of +Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. @@ -19,181 +19,126 @@ Conversational AI architectures are typically large and require a lot of data an for training. NeMo uses `PyTorch Lightning `_ for easy and performant multi-GPU/multi-node mixed-precision training. -`Pre-trained NeMo models. `_ - -Also see the two introductory videos below for a high level overview of NeMo. - -* Developing State-Of-The-Art Conversational AI Models in Three Lines of Code. -.. raw:: html - -
- -
- -* NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022. -.. image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg - :target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0 - :width: 560 - :alt: Develop Conversational AI Models in 3 Lines - -For more information and questions, visit the `NVIDIA NeMo Discussion Board `_. +`Pre-trained NeMo models `_ are available +in 14+ languages. Prerequisites ------------- Before you begin using NeMo, it's assumed you meet the following prerequisites. -#. You have Python version 3.9, 3.10. +#. You have Python version 3.10 or above. #. You have Pytorch version 1.13.1 or 2.0+. -#. You have access to an NVIDIA GPU for training. +#. You have access to an NVIDIA GPU, if you intend to do model training. .. _quick_start_guide: Quick Start Guide ----------------- -This NeMo Quick Start Guide is a starting point for users who want to try out NeMo; specifically, this guide enables users to quickly get started with the NeMo fundamentals by walking you through an example audio translator and voice swap. +You can try out NeMo's ASR, NLP and TTS functionality with the example below, which is based on the `Audio Translation `_ tutorial. -If you're new to NeMo, the best way to get started is to take a look at the following tutorials: - -* `Text Classification (Sentiment Analysis) `__ - demonstrates the Text Classification model using the NeMo NLP collection. -* `NeMo Primer `__ - introduces NeMo, PyTorch Lightning, and OmegaConf, and shows how to use, modify, save, and restore NeMo models. -* `NeMo Models `__ - explains the fundamental concepts of the NeMo model. -* `NeMo voice swap demo `__ - demonstrates how to swap a voice in the audio fragment with a computer generated one using NeMo. - -Below we is the code snippet of Audio Translator application. +Once you have :ref:`installed NeMo `, then you can run the code below: .. code-block:: python - # Import NeMo and it's ASR, NLP and TTS collections - import nemo - # Import Speech Recognition collection + # Import NeMo's ASR, NLP and TTS collections import nemo.collections.asr as nemo_asr - # Import Natural Language Processing colleciton import nemo.collections.nlp as nemo_nlp - # Import Speech Synthesis collection import nemo.collections.tts as nemo_tts - # Next, we instantiate all the necessary models directly from NVIDIA NGC - # Speech Recognition model - QuartzNet trained on Russian part of MCV 6.0 - quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="stt_ru_quartznet15x5").cuda() - # Neural Machine Translation model - nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name='nmt_ru_en_transformer6x6').cuda() - # Spectrogram generator which takes text as an input and produces spectrogram - spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch").cuda() - # Vocoder model which takes spectrogram and produces actual audio - vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan").cuda() - # Transcribe an audio file - # IMPORTANT: The audio must be mono with 16Khz sampling rate - # Get example from: https://nemo-public.s3.us-east-2.amazonaws.com/mcv-samples-ru/common_voice_ru_19034087.wav - russian_text = quartznet.transcribe(['Path_to_audio_file']) - print(russian_text) - # You should see russian text here. Let's translate it to English - english_text = nmt_model.translate(russian_text) - print(english_text) - # After this you should see English translation - # Let's convert it into audio - # A helper function which combines FastPitch and HiFiGAN to go directly from - # text to audio - def text_to_audio(text): - parsed = spectrogram_generator.parse(text) - spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed) - audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) - return audio.to('cpu').numpy() - audio = text_to_audio(english_text[0]) + # Download an audio file that we will transcribe, translate, and convert the written translation to speech + import wget + wget.download("https://nemo-public.s3.us-east-2.amazonaws.com/zh-samples/common_voice_zh-CN_21347786.mp3") + # Instantiate a Mandarin speech recognition model and transcribe an audio file. + asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_zh_citrinet_1024_gamma_0_25") + mandarin_text = asr_model.transcribe(['common_voice_zh-CN_21347786.mp3']) + print(mandarin_text) -Installation ------------- + # Instantiate Neural Machine Translation model and translate the text + nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name="nmt_zh_en_transformer24x6") + english_text = nmt_model.translate(mandarin_text) + print(english_text) -Pip -~~~ -Use this installation mode if you want the latest released version. + # Instantiate a spectrogram generator (which converts text -> spectrogram) + # and vocoder model (which converts spectrogram -> audio waveform) + spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch") + vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan") -.. code-block:: bash + # Parse the text input, generate the spectrogram, and convert it to audio + parsed_text = spectrogram_generator.parse(english_text[0]) + spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed_text) + audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) - apt-get update && apt-get install -y libsndfile1 ffmpeg - pip install Cython - pip install nemo_toolkit[all] + # Save the audio to a file + import soundfile as sf + sf.write("output_audio.wav", audio.to('cpu').detach().numpy()[0], 22050) -Pip from source -~~~~~~~~~~~~~~~ -Use this installation mode if you want the version from a particular GitHub branch (for example, ``main``). +You can learn more by about specific tasks you are interested in by checking out the NeMo :doc:`tutorials <./tutorials>`, or documentation (e.g. read :doc:`here <../asr/intro>` to learn more about ASR). -.. code-block:: bash +You can also learn more about NeMo in the `NeMo Primer `_ tutorial, which introduces NeMo, PyTorch Lightning, and OmegaConf, and shows how to use, modify, save, and restore NeMo models. Additionally, the `NeMo Models `__ tutorial explains the fundamentals of how NeMo models are created. These concepts are also explained in detail in the :doc:`NeMo Core <../core/core>` documentation. - apt-get update && apt-get install -y libsndfile1 ffmpeg - pip install Cython - python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all] - # For v1.0.2, replace {BRANCH} with v1.0.2 like so: - # python -m pip install git+https://github.com/NVIDIA/NeMo.git@v1.0.2#egg=nemo_toolkit[all] -From source -~~~~~~~~~~~ -Use this installation mode if you are contributing to NeMo. +Introductory videos +------------------- -.. code-block:: bash - - apt-get update && apt-get install -y libsndfile1 ffmpeg - git clone https://github.com/NVIDIA/NeMo - cd NeMo - ./reinstall.sh - -Docker containers -~~~~~~~~~~~~~~~~~ -To build a nemo container with Dockerfile from a branch, please run - -.. code-block:: bash +See the two introductory videos below for a high level overview of NeMo. - DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest. +**Developing State-Of-The-Art Conversational AI Models in Three Lines of Code** +.. raw:: html -If you chose to work with the ``main`` branch, we recommend using `NVIDIA's PyTorch container version 21.05-py3 `_, then install from GitHub. +
+ +
-.. code-block:: bash +**NVIDIA NeMo: Toolkit for Conversational AI at PyData Yerevan 2022** - docker run --gpus all -it --rm -v :/NeMo --shm-size=8g \ - -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \ - stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:21.05-py3 +.. raw:: html -.. _mac-installation: +
+ +
-Mac computers with Apple silicon -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To install NeMo on Mac with Apple M-Series GPU: +.. _installation: -- install `Homebrew `_ package manager +Installation +------------ -- create a new Conda environment +The simplest way to install NeMo is via pip, see info below. -- install PyTorch 2.0 or higher +.. note:: Full NeMo installation instructions (with more ways to install NeMo, and how to handle optional dependencies) can be found in the `GitHub README `_. -- run the following code: +Conda +~~~~~ -.. code-block:: shell +We recommend installing NeMo in a fresh Conda environment. - # install mecab using Homebrew, required for sacrebleu for NLP collection - brew install mecab +.. code-block:: bash - # install pynini using Conda, required for text normalization - conda install -c conda-forge pynini + conda create --name nemo python==3.10.12 + conda activate nemo - # install Cython manually - pip install cython +Install PyTorch using their `configurator `_. - # clone the repo and install in development mode - git clone https://github.com/NVIDIA/NeMo - cd NeMo - ./reinstall.sh +Pip +~~~ +Use this installation mode if you want the latest released version. +.. code-block:: bash + apt-get update && apt-get install -y libsndfile1 ffmpeg + pip install Cython + pip install nemo_toolkit['all'] -`FAQ `_ ---------------------------------------------------- -Have a look at our `discussions board `_ and feel free to post a question or start a discussion. +Depending on the shell used, you may need to use ``"nemo_toolkit[all]"`` instead in the above command. +Discussion board +---------------- +For more information and questions, visit the `NVIDIA NeMo Discussion Board `_. Contributing ------------ @@ -203,4 +148,4 @@ We welcome community contributions! Refer to the `CONTRIBUTING.md `_. \ No newline at end of file +NeMo is released under an `Apache 2.0 license `_. \ No newline at end of file