Final multimodal PR with our recent developments on MM side (#8127) · NVIDIA/NeMo@bb575b7

Commit

Final multimodal PR with our recent developments on MM side (#8127)

* Hotfix (#7501) (#7568)

Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: jbaczek <[email protected]>

* Avoid duplicated checkpoint save (#7555) (#7566)

Signed-off-by: Mikołaj Błaż <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>

* Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483)

* Cache weight and transpose only in the first batch in all training, val, and test runs



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add an option to disable manual GC in validation (#7467) (#7476)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>

* Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695)

* update publications section to point to blog website page



* add hyphen



* use double backquotes for code formatting



---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>

* Fix multi rank finetune for ASR (#7684) (#7699)

* Fix multi rank finetune for ASR



* Actually add time



* Actually add time



---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Update docs: readme, getting started, ASR intro (#7679)

* [TTS] Add dataset to path of logged artifacts (#7462)

* [TTS] Add dataset to path of logged artifacts

Signed-off-by: Ryan <[email protected]>

* [TTS] Revert axis name back to Audio Frames

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* move install info to INSTALLATION.md

Signed-off-by: Elena Rastorgueva <[email protected]>

* tidy up links

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix sft dataset truncation (#7464)

* Add fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330)

* striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling

Signed-off-by: mburchi <[email protected]>

* transpose conv1d inputs

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: mburchi <[email protected]>

* Update subsampling.py

change striding_conv1d_k5 to striding_conv1d

Signed-off-by: Maxime Burchi <[email protected]>

* cv branch

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* video manifest

Signed-off-by: mburchi <[email protected]>

* add collection classes

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test_step_outputs

Signed-off-by: mburchi <[email protected]>

* correct manifest bug when having only audio or only videos

Signed-off-by: mburchi <[email protected]>

* correct manifest bug when having only audio or only videos

Signed-off-by: mburchi <[email protected]>

* clean references

Signed-off-by: mburchi <[email protected]>

* freeze unfreeze transcribe cv models

Signed-off-by: mburchi <[email protected]>

* correct manifest get_full_path bug

Signed-off-by: mburchi <[email protected]>

* update for PR

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* guard torchvision

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/cv/data/video_to_text_dataset.py

Co-authored-by: Igor Gitman <[email protected]>
Signed-off-by: Maxime Burchi <[email protected]>

* _video_speech_collate_fn in cv/data/video_to_text.py

Signed-off-by: mburchi <[email protected]>

* add self.out = None to asr subsampling

Signed-off-by: mburchi <[email protected]>

* Update nemo/collections/cv/data/video_to_text_dataset.py

Co-authored-by: Igor Gitman <[email protected]>
Signed-off-by: Maxime Burchi <[email protected]>

* cv -> multimodal/speech_cv branch

Signed-off-by: mburchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: mburchi <[email protected]>
Signed-off-by: Maxime Burchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Igor Gitman <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* HF StarCoder to NeMo conversion script (#7421)

* Script to convert HF StarCoder checkpoint to NeMo

Signed-off-by: Jan Lasek <[email protected]>

* StarCoder conversion test

Signed-off-by: Jan Lasek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Lasek <[email protected]>

* Fix test

Signed-off-by: Jan Lasek <[email protected]>

* Catch up with save_to changes

Signed-off-by: Jan Lasek <[email protected]>

* Don't abbreviate args for clarity

Signed-off-by: Jan Lasek <[email protected]>

* Configurable precision: BF16 vs FP32

Signed-off-by: Jan Lasek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jan Lasek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix bug when loading dist ckpt in peft (#7452)

Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix adding positional embeddings in-place in transformer module (#7440)

Signed-off-by: Tamerlan Tabolov <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix (#7478)

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add sleep (#7498) (#7499)

* add sleep



* add sleep onto config instead



* add comment



---------

Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Gerald Shen <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix exp manager check for sleep (#7503) (#7504)

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* bugfix: trainer.accelerator=auto from None. (#7492) (#7493)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [doc] fix broken link (#7481)

Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] Read audio as int32 to avoid flac read errors (#7477)

* [TTS] Read audio as int32 to avoid flac read errors

Signed-off-by: Ryan <[email protected]>

* [TTS] Add comment about read failures

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409)

* Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS
* Train 'AISHELL-3' dataset with multi-speakers

Signed-off-by: Robin Dong <[email protected]>

* Update get_data.py

update copyright header

Signed-off-by: Xuesong Yang <[email protected]>

* Update get_data.py

added a disclaimer

Signed-off-by: Xuesong Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add new configuration file for AISHELL3 with multispeaker of fastpitch

Signed-off-by: Robin Dong <[email protected]>

---------

Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* dllogger - log on rank 0 only (#7513)

Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix TTS FastPitch tutorial (#7494) (#7516)

* Fix

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix get_dist() tensor dimension (#7506) (#7515)

Signed-off-by: Jocelyn Huang <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* fix (#7511)

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] Fix FastPitch data prep tutorial (#7524)

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add italian tokenization (#7486)

* add italian tokenization

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more ipa lexicon it

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error deletion

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* add test

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Replace None strategy with auto in tutorial notebooks (#7521) (#7527)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* unpin setuptools (#7534) (#7535)

Signed-off-by: fayejf <[email protected]>
Co-authored-by: fayejf <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* remove auto generated examples (#7510)

* explicitly remove autogenerated examples for data parallel evaluation

Signed-off-by: arendu <[email protected]>

* mark autogenrated and remove it for test

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264)

It is passed as an explicit argument rather than through
`**strategy_args` so as to ensure someone cannot accidentally pass other
arguments that would end up being ignored.

It is a keyword-only argument to ensure that if in the future we want to
update the signature to `**strategy_args`, we can do it without breaking
code.

Signed-off-by: Olivier Delalleau <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533)

* fix none dataloader issue ptl2



* ptl2.0 logging fixes for rnnt_models



---------

Signed-off-by: KunalDhawan <[email protected]>
Co-authored-by: Kunal Dhawan <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* gpus -> devices (#7542) (#7545)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Update FFMPEG version to fix issue with torchaudio (#7551) (#7553)

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* PEFT GPT & T5 Refactor (#7308)

* initial implementation of add_adapters API

* correct type hint

* Add config in add_adapters for save and load (@author bobchen)

* Remove AdapterConfig to avoid import error

* Add AdaterConfig back and move adaptermixin to sft model

* Add NLPSaveRestoreConnector as default in NLPModel.restore_from

* Add restore_from_nemo_with_adapter and test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename t5 file and classes to be consistent with GPT

* add t5 sft dataset

* add support for single-file format with T5SFTDataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Various small changes to make T5 SFT work like GPT SFT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add adapter evaluation test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MultiAdaterConfig for ia3 and fix builder issue

* Make ptuning for T5SFTModel work using mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add IA3_Adapter for AdapterName

* Add adapter name for ptuning and attention adapter

* Make test script GPT/T5 agnostic

* Add layer selection feature

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Integrate adapter name and config

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt peft tuning script to new API

* add t5 peft tuning script with new API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix IA3 layer selection issue

* Override state_dict on SFT model instead of mixin

* Add load adapter by adapter config

* move peft config map away from example script

* auto get config from nemo adapter

* Move PEFTConfig to new file

* fix ckpt save/load for t5

* name change: add_adapters -> add_adapter

* variable name change

* update t5 script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix t5 issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add weight tying

* update gpt tuning script

* PEFT-API proposal

* Fix according to comments

* update tuning scripts

* move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore

* Add mcore_gpt support for NLPAdapterMixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* variable name change to distinguish "peft" and "adapter"

* override `load_adapters` to support `add_adapter` name change

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tuning and eval script for adapter save/load

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Ptuning on first stage only

* add lora tutorial for review

* Fix layer selection for mcore

* add landing page

* fix resume training

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add mcore condition in sharded_state_dict to make sft work

* Update lora_tutorial.md

First edit of this file for PEFT documentation for NeMO

Signed-off-by: hkelly33 <[email protected]>

* rename Adapter to AttentionAdapter to avoid confusion in doc

* Change load_adapters to load .nemo

* add quick start guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_adapters with .ckpt

* Remove setup_complete changes in load_adapters

* update landing page

* remove typo

* Updated quick_start.md per Chen Cui

Signed-off-by: hkelly33 <[email protected]>

* Add inference config merger and tutorial

* Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel

* add supported_methods.md and update other documentations

* Update supported_methods.md

minor updates.

Signed-off-by: Adi Renduchintala <[email protected]>

* Update landing_page.md

minor update.

Signed-off-by: Adi Renduchintala <[email protected]>

* Modify doc string for NLPAdapterModelMixin

* Add doc string add_adapters in NLPAdapterModelMixin

* rename canonical adapters

* remove mcore hard dependency

* [PATCH] move microbatch calculator to nemo from apex

* remove apex dependency in gpt and t5 sft models

* remove apex dependency in gpt model

* render doc strings

* fix

* Add missing virtual_tokens on ptuning

* fix docstrings

* update gpt-style model coverage in docs

* update docstring

* Remove pdb

* add lightning_fabric to make docstring rendering work

* Add Ptuning missing key

* try docstring rendering

* Fix ptuning issue

* update gpt t5 peft tuning and eval scripts

* typos

* update eval config

* fix bug relating to apex dependency removal

* typo

* make predict step behave the same as test step

* make lora tutorial work in notebook

* cosmetics

* update yaml scripts

* mcore_gpt attribute optional

* typo

* update eval scripts and fix T5 eval bugs

* add NLPDDPStrategyNotebook and trainer builder logic to use it

* update lora notebook to use new trainer builder

* fix microbatch calculator bug for inference after training

* Convert markdown files to RST and incorporate with doc

* typo

* revise language

* remove extra cell

* remove unnecessary inheritance

* remove old tests

* move layer selection default so logging messages make sense

* remove `save_adapters` as adapter weights are saved automatically during training

* initialize weights from a checkpoint instead of randomly

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* revert config changes

* remove accidental breakpoint

* support TP>1 loading

* infer adapter type from checkpoint in during eval

* breakup add adapter

* enable interpolation of train_ds and validation_ds

* update metric calc script to conform to single-file eval format

* remove extraneous print

* update lora notebook for updated merge_inference_cfg

* Update nlp_adapter_mixins.py

variable name change

Signed-off-by: Chen Cui <[email protected]>

* turn off grad scaler for PP to match old scripts

* remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class

* remove resume_from_checkpoint check since covered in #7335

* revert changes made in eval config interpolation

* more interpolation

* typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove dup line

Signed-off-by: Chen Cui <[email protected]>

* code style warnings

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix config mistake

Signed-off-by: Chen Cui <[email protected]>

* add copyright header

Signed-off-by: Chen Cui <[email protected]>

* fix code check warnings

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests)

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more deprecation notices

Signed-off-by: Chen Cui <[email protected]>

* update deprecation notices

Signed-off-by: Chen Cui <[email protected]>

* update deprecation notices

Signed-off-by: Chen Cui <[email protected]>

* consolidate peft and sft scripts

Signed-off-by: Chen Cui <[email protected]>

* update CI tests

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* notebook branch points to main to prepare for merge

Signed-off-by: Chen Cui <[email protected]>

* fix gpt and t5 validation with any metric other than loss

Signed-off-by: Chen Cui <[email protected]>

* support pre-extracted checkpoints

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: hkelly33 <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: hkelly33 <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Yuanzhe Dong <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* fix a typo (#7496)

Signed-off-by: BestJuly <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560)

* remove curly braces.
* remove installation of pynini.
---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* add youtube embed url (#7570)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536)

* Remap speakers to continuous range of speaker_id for dataset AISHELL3
* Add new key/value pair to record raw speaker for AISHELL3 dataset

Signed-off-by: Robin Dong <[email protected]>

---------

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572)

* added correct validation_step_outputs initialization for mutli-dataloader



* changed kernel for display



* Update logic for validation and test step outputs



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert multidataloader changes in multilang ASR notebook



---------

Signed-off-by: KunalDhawan <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Kunal Dhawan <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Append output of val step to self.validation_step_outputs (#7530) (#7532)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573)

* Append val/test output to the instance variable in EncDecSpeakerLabelModel



* Handle test case in evaluation_step



* Replace type with isinstance



---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix CustomProgressBar for resume (#7427) (#7522)

* Fix CustomProgress Bar for resume and multiple epochs



* Edit num_training_batches



* Use max_steps as total for progress bar for resume



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix typos in nfa and speech enhancement tutorials (#7580) (#7583)

Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* update strategy (#7577) (#7578)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix typos (#7581)

Signed-off-by: Elena Rastorgueva <[email protected]>

* Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584)

* Change strategy to auto



---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548)

* Add missing quotes for auto strategy



* Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb



---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* add build os key (#7596) (#7599)

* add build os key



* add tools



* update to stable version



---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* StarCoder SFT test + bump PyT NGC image to 23.09 (#7540)

* Add SFT StarCoder test

Signed-off-by: Jan Lasek <[email protected]>

* Remove _modify_config call as it is covered in load_from_nemo just below

Signed-off-by: Jan Lasek <[email protected]>

* Test with pyt:23.09 container

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* defaults changed (#7600)

* defaults changed

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

* update

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add ItalianPhonemesTokenizer (#7587)

* add ItalianPhonemesTokenizer

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix Italian phonemes

Signed-off-by: GiacomoLeoneMaria <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test

Signed-off-by: GiacomoLeoneMaria <[email protected]>

---------

Signed-off-by: GiacomoLeoneMaria <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* best ckpt fix (#7564) (#7588)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add files via upload (#7598)

specifies the branch

Signed-off-by: George <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586)

* Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Safeguard nemo_text_processing installation on ARM (#7485)

* safeguard nemo_text_processing installing

Signed-off-by: Jason <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update check

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Bound transformers version in requirements (#7620)

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* fix llama2 70b lora tuning bug (#7622)

* fix llama2 70b lora tuning bug

Signed-off-by: Chen Cui <[email protected]>

* Update peft_config.py

brackets

Signed-off-by: Adi Renduchintala <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix import error no module name model_utils (#7629)

Signed-off-by: Mehadi Hasan Menon <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* add fc large ls models (#7641)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Elena Rastorgueva <[email protected]>

* bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642)

* [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0
* trainer.gpus -> trainer.devices
* fixed related tutorial bugs
---------
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix ssl models ptl monitor val through logging (#7608) (#7614)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix metrics for SE tutorial (#7604) (#7612)

Signed-off-by: Ante Jukić <[email protected]>
Co-authored-by: anteju <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644)

* Add ddp_find_unused_parameters=True and change acclerator to auto



* Add ddp_find_unused_parameters True for normalization_as_tagging_train.py



---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix py3.11 dataclasses issue  (#7616)

* Fix py3.11 dataclasses issue  (#7582)

* Update ASR configs to support Python 3.11

Signed-off-by: smajumdar <[email protected]>

* Update TTS configs to support Python 3.11

Signed-off-by: smajumdar <[email protected]>

* Guard MeCab and Ipadic

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix remaining ASR dataclasses

Signed-off-by: smajumdar <[email protected]>

* Fix remaining ASR dataclasses

Signed-off-by: smajumdar <[email protected]>

* Fix scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update name to ConfidenceMethodConfig

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586)

* Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Safeguard nemo_text_processing installation on ARM (#7485)

* safeguard nemo_text_processing installing

Signed-off-by: Jason <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update check

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix changes to confidence measure

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Jason <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Jason <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]>

* Fix issues with Dockerfile (#7650) (#7652)

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635)

* decoding and test fix

Signed-off-by: Aleksandr Laptev <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aleksandr Laptev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* [ASR] Fix type error in jasper (#7636) (#7653)

Signed-off-by: Ryan <[email protected]>
Co-authored-by: Ryan Langman <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>

* [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468)

* [TTS] Add STFT and SI-SDR loss to audio codec recipe

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix STFT resolution

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix training metric logging

Signed-off-by: Ryan <[email protected]>

* [TTS] Add docstring to mel and stft losses

Signed-off-by: Ryan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Ryan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <[email protected]>

* add outline of asr quickstart info to asr/intro.rst

Signed-off-by: Elena Rastorgueva <[email protected]>

* add CLI, LM and real-time transcription sections

Signed-off-by: Elena Rastorgueva <[email protected]>

* Create per.py (#7538)

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Create per.py

Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.)

Signed-off-by: Sasha Meister <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Sasha Meister <[email protected]>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progress bar to reflect total microbatch cnt

Signed-off-by: Abhishree <[email protected]>

* Modify CustomProgressBar class

1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch
2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder

Signed-off-by: Abhishree <[email protected]>

* Add CustomProgressBar callback to tuning files

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Set Activation Checkpointing Defaults (#7404)

* Set Activation Checkpointing Defaults

Signed-off-by: Abhinav Khattar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for None

Signed-off-by: Abhinav Khattar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* make loss mask default to false (#7407)

Signed-off-by: eharper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Add dummy userbuffer config files (#7408)

Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* add missing ubconf files (#7412)

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* New tutorial on Speech Data Explorer (#7405)

* Added Google Colab based tutorial on Speech Data Explorer

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Update ptl training ckpt conversion script to work with dist ckpt (#7416)

* update ptl convert script

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* don't break legacy

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Allow disabling sanity checking when num_sanity_val_steps=0 (#7413)

* Allow disabling sanity checking when num_sanity_val_steps=0

Signed-off-by: Abhishree <[email protected]>

* Update num_sanity_val_steps to be a multiple of num_microbatches

Signed-off-by: Abhishree Thittenamane <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Add comprehensive error messages (#7261)

Signed-off-by: Anton Peganov <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* check NEMO_PATH (#7418)

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* layer selection for ia3 (#7417)

* layer selection for ia3

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Fix missing pip package 'einops' (#7397)

Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix failure of pyaudio in Google Colab (#7396)

Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Update README.md: output_path --> output_manifest_filepath (#7442)

Signed-off-by: Samuele Cornell <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Add rope dynamic linear scaling (#7437)

* Add dynamic linear scaling

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix None dataloader issue in PTL2.0 (#7455)

* Fix None dataloader issue in PTL2.0

Signed-off-by: KunalDhawan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updating values of self._validation_dl and self._test_dl as well

Signed-off-by: KunalDhawan <[email protected]>

* updating values of self._validation_dl and self._test_dl as well

Signed-off-by: KunalDhawan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: KunalDhawan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* [ASR] Confidence measure -> method renames (#7434)

* measure -> method

Signed-off-by: Aleksandr Laptev <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aleksandr Laptev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* Add steps for document of getting dataset 'SF Bilingual Speech' (#7378)

* Add steps for document of getting dataset 'SF Bilingual Speech'

Signed-off-by: Robin Dong <[email protected]>

* Update datasets.rst

added a link from a tutorial demonstrating detailed data prep steps.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* RNN-T confidence and alignment bugfix (#7381)

* new frame_confidence and alignments lists are now always created after the while loop

Signed-off-by: Aleksandr Laptev <[email protected]>

* tests added

Signed-off-by: Aleksandr Laptev <[email protected]>

---------

Signed-off-by: Aleksandr Laptev <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix resume from checkpoint in exp_manager (#7424) (#7426)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix checking of cuda/cpu device for inputs of Decoder (#7444)

* Fix checking of cuda/cpu device for inputs of Decoder

Signed-off-by: Robin Dong <[email protected]>

* Update tacotron2.py

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Robin Dong <[email protected]>
Signed-off-by: Jason <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* Fix failure of ljspeech's get_data.py (#7430)

* Fix failure of ljspeech's get_data.py

Signed-off-by: Robin Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <[email protected]>

* [TTS] Fix audio codec type checks (#7373)

* [TTS] Fix audio codec type checks

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix audio codec tests

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>

* […

Loading branch information

70 people authored Jan 20, 2024

1 parent d656f22 commit bb575b7

docs/source/multimodal/api.rst

-Original file line number
+Diff line change
@@ Expand Up / @@ -10,7 +10,7 @@ Model Classes @@
         :members: __init__, configure_optimizers
-    .. autoclass:: nemo.collections.multimodal.models.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion
+    .. autoclass:: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion
         :show-inheritance:
         :no-members:
         :members: __init__, training_step, validation_step, setup, build_train_valid_test_datasets
@@ Expand Down Expand Up / @@ -49,7 +49,7 @@ Modules @@
         :show-inheritance:
         :no-members:
-    .. autoclass:: nemo.collections.multimodal.models.stable_diffusion.ldm.autoencoder.AutoencoderKL
+    .. autoclass:: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKL
         :show-inheritance:
         :no-members:
         :members: __init__, encode, decode
@@ Expand Down @@

docs/source/multimodal/mllm/checkpoint.rst

-Original file line number
+Diff line change
@@ Expand Up / @@ -108,7 +108,7 @@ Adjust model parallelism with: @@
         --target_tensor_model_parallel_size=??? \
         --pipeline_model_parallel_size=??? \
         --target_pipeline_model_parallel_size=??? \
-        --model_class="nemo.collections.multimodal.models.neva.neva_model.MegatronNevaModel" \
+        --model_class="nemo.collections.multimodal.models.multimodal_llm.neva.neva_model.MegatronNevaModel" \
         --precision=32 \
         --tokenizer_model_path=/path/to/tokenizer.model \
         --tp_conversion_only

docs/source/multimodal/text2img/insp2p.rst

-Original file line number
+Diff line change
@@ Expand Up / @@ -6,7 +6,7 @@ Model Introduction @@
     InstructPix2Pix [InstructPix2Pix]_ :cite:`mm-models-insp2p` offers a unique approach to image editing using human-written instructions. Given an input image and a textual directive, the model adjusts the image according to the provided instructions. NeMo Multimodal presents a training pipeline for this conditional diffusion model, utilizing a dataset generated by harnessing the strengths of two prominent pretrained models: a language model (GPT-3) and a text-to-image model (Stable Diffusion). The InstructPix2Pix model operates swiftly, editing images within seconds, eliminating the need for per-example fine-tuning or inversion. It has demonstrated remarkable results across a wide variety of input images and written instructions.
-    Built upon the Stable Diffusion framework, NeMo's InstructPix2Pix shares a similar architecture with Stable Diffusion (refer to :doc:`Stable Diffusion <./sd>`). What sets it apart is its unique training dataset and the combined guidance from both image and text prompts. Specifically, InstructPix2pix ::class::``nemo.collections.multimodal.models.instruct_pix2pix.ldm.ddpm_edit.MegatronLatentDiffusionEdit`` is derived directly from Stable Diffusion's ::class::``nemo.collections.multimodal.models.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion``, with alterations to accommodate the dataset and provide support for dual guidance.
+    Built upon the Stable Diffusion framework, NeMo's InstructPix2Pix shares a similar architecture with Stable Diffusion (refer to :doc:`Stable Diffusion <./sd>`). What sets it apart is its unique training dataset and the combined guidance from both image and text prompts. Specifically, InstructPix2pix ::class::``nemo.collections.multimodal.models.instruct_pix2pix.ldm.ddpm_edit.MegatronLatentDiffusionEdit`` is derived directly from Stable Diffusion's ::class::``nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion``, with alterations to accommodate the dataset and provide support for dual guidance.
     Training Dataset
     --------------------
@@ Expand Down @@

docs/source/multimodal/text2img/sd.rst

-Original file line number
+Diff line change
@@ Expand Up @@
     .. code-block:: yaml
         first_stage_config:
-            _target_: nemo.collections.multimodal.models.stable_diffusion.ldm.autoencoder.AutoencoderKL
+            _target_: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKL
             from_pretrained: /path/to/vae.bin
             embed_dim: 4
             monitor: val/rec_loss
@@ Expand Down @@

examples/multimodal/multimodal_llm/neva/conf/neva_inference.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,6 +11,7 @@ inference: @@
       compute_logprob: False  # a flag used to compute logprob of all the input text, a very special case of running inference, default False
       end_strings: ["<extra_id_1>","<extra_id_7>",]  # generation will stop when one of these tokens is generated
       images_base_path: /pwd/images
+      insert_image_token: null # `left` or `right` or `null`
     trainer:
       devices: 8
@@ Expand All / @@ -24,7 +25,7 @@ tensor_model_parallel_size: 8 @@
     pipeline_model_parallel_size: 1
     pipeline_model_parallel_split_rank: 0 # used for encoder and decoder model (0 for others)
     neva_model_file: /pwd/nemo_experiments/nemo_llava.nemo #neva_22b_tp8_finetuned_v1.nemo neva_8b_tp4_finetuned_v1.nemo
-    llm_model_file: null
+    base_model_file: null
     checkpoint_dir: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/checkpoints # checkpoint file dir. This is used to load the PTL checkpoint generated during the Kosmos training
     checkpoint_name: null #megatron_clip--val_loss=0.41-step=13499-consumed_samples=431904.0.ckpt # PTL checkpoint file name, only used for PTL checkpoint loading
     hparams_file: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/version_0/hparams.yaml # model configuration file, only used for PTL checkpoint loading
@@ Expand Down @@

examples/multimodal/multimodal_llm/neva/conf/neva_peft.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -209,7 +209,7 @@ model: @@
       optim:
         name: fused_adam
-        lr: 2e-5
+        lr: 2e-4
         weight_decay: 0.
         betas:
           - 0.9
@@ Expand Down @@

examples/multimodal/multimodal_llm/neva/convert_hf_llava_to_neva.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -18,7 +18,8 @@ @@
         python convert_hf_llava_to_neva.py \
          --in-file <path_to_hf_checkpoints_folder> \
          --out-file <path_to_output_nemo_file> \
-         --tokenizer-model <path_to_sp_tokenizer_model>
+         --tokenizer-model <path_to_sp_tokenizer_model> \
+         --conv-template llama_2 # nvgpt, llama_2, v1 (vicuna)
     """
     import os
@@ Expand Down Expand Up / @@ -49,6 +50,13 @@ def get_args(): @@
             "--in-file", type=str, default=None, required=True, help="Path to Huggingface LLaMA checkpoints",
         )
         parser.add_argument("--out-file", type=str, default=None, required=True, help="Path to output .nemo file.")
+        parser.add_argument(
+            "--conv-template",
+            type=str,
+            default="llama_2",
+            required=False,
+            help="Conversation template: nvgpt, llama_2, v1 (vicuna)",
+        )
         parser.add_argument(
             "--tokenizer-model", type=str, default=None, required=False, help="Path to sentencepiece tokenizer model."
         )
@@ Expand Down Expand Up / @@ -121,6 +129,8 @@ def load_config(args, llava_config): @@
             nemo_config.num_query_groups = llava_config['num_key_value_heads']
         nemo_config.use_cpu_initialization = True
         nemo_config.activation = 'fast-swiglu'
+        nemo_config.data.conv_template = args.conv_template
+        nemo_config.mm_cfg.model_type = args.conv_template
         if args.tokenizer_model is None:
             nemo_config.tokenizer.model = llava_config['tokenizer_model']
         else:
@@ Expand Down @@

examples/multimodal/multimodal_llm/neva/eval/gradio_cli.py

-Original file line number
+Diff line change
@@ -0,0 +1,41 @@
+    # Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+    #
+    # Licensed under the Apache License, Version 2.0 (the "License");
+    # you may not use this file except in compliance with the License.
+    # You may obtain a copy of the License at
+    #
+    #     http://www.apache.org/licenses/LICENSE-2.0
+    #
+    # Unless required by applicable law or agreed to in writing, software
+    # distributed under the License is distributed on an "AS IS" BASIS,
+    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    # See the License for the specific language governing permissions and
+    # limitations under the License.
+    import base64
+    import requests
+    # URL of the Gradio server
+    url = 'http://localhost:8890/api/predict/'
+    # Prepare the text data
+    text_data = '<image>Describe this image please.'
+    # Prepare the image data
+    with open("/path/to/images/001.jpg", "rb") as image_file:
+        encoded_string = base64.b64encode(image_file.read()).decode()
+    # Data to send
+    data = {'data': [text_data, encoded_string]}
+    # Sending a POST request to the Gradio server
+    response = requests.post(url, json=data)
+    # Checking if the request was successful
+    if response.status_code == 200:
+        # Parsing the response
+        response_data = response.json()
+        print("Response from server:", response_data)
+    else:
+        print("Failed to get a response from the server, status code:", response.status_code)

examples/multimodal/multimodal_llm/neva/eval/gradio_server.py

-Original file line number
+Diff line change
@@ -0,0 +1,108 @@
+    # Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+    #
+    # Licensed under the Apache License, Version 2.0 (the "License");
+    # you may not use this file except in compliance with the License.
+    # You may obtain a copy of the License at
+    #
+    #     http://www.apache.org/licenses/LICENSE-2.0
+    #
+    # Unless required by applicable law or agreed to in writing, software
+    # distributed under the License is distributed on an "AS IS" BASIS,
+    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    # See the License for the specific language governing permissions and
+    # limitations under the License.
+    import base64
+    import io
+    import gradio as gr
+    import PIL.Image
+    from omegaconf import OmegaConf
+    from nemo.collections.multimodal.parts.utils import create_neva_model_and_processor
+    CFG_STRING = """
+    trainer:
+      devices: 1
+      num_nodes: 1
+      accelerator: gpu
+      logger: False # logger provided by exp_manager
+      precision: bf16 # 16, 32, or bf16
+    inference:
+      greedy: False # Whether or not to use sampling ; use greedy decoding otherwise
+      top_k: 0  # The number of highest probability vocabulary tokens to keep for top-k-filtering.
+      top_p: 0.9 # If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
+      temperature: 0.2 # sampling temperature
+      add_BOS: False # add the bos token at the begining of the prompt
+      tokens_to_generate: 256 # The minimum length of the sequence to be generated.
+      all_probs: False  # whether return the log prob for all the tokens in vocab
+      repetition_penalty: 1.2  # The parameter for repetition penalty. 1.0 means no penalty.
+      min_tokens_to_generate: 0  # The minimum length of the sequence to be generated.
+      compute_logprob: False  # a flag used to compute logprob of all the input text, a very special case of running inference, default False
+      end_strings: ["<extra_id_1>","<extra_id_7>",]  # generation will stop when one of these tokens is generated
+      images_base_path: /pwd/images
+      insert_image_token: null # `left` or `right` or `null`
+    cluster_type: BCP
+    tensor_model_parallel_size: 1
+    pipeline_model_parallel_size: 1
+    pipeline_model_parallel_split_rank: 0 # used for encoder and decoder model (0 for others)
+    neva_model_file: /pwd/nemo_experiments/nemo_llava.nemo #neva_22b_tp8_finetuned_v1.nemo neva_8b_tp4_finetuned_v1.nemo
+    base_model_file: null
+    checkpoint_dir: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/checkpoints # checkpoint file dir. This is used to load the PTL checkpoint generated during the Kosmos training
+    checkpoint_name: null #megatron_clip--val_loss=0.41-step=13499-consumed_samples=431904.0.ckpt # PTL checkpoint file name, only used for PTL checkpoint loading
+    hparams_file: null #/pwd/nemo_multimodal/nemo_experiments/nemo_llava_finetune/version_0/hparams.yaml # model configuration file, only used for PTL checkpoint loading
+    """
+    cfg = OmegaConf.create(CFG_STRING)
+    cfg.neva_model_file = "/path/to/llava-v1.5-7b.nemo"
+    model, image_processor = create_neva_model_and_processor(cfg)
+    def predict(prompt, image_base64=None):
+        input_data = {"prompt": prompt}
+        if image_base64 is not None:
+            image_data = base64.b64decode(image_base64)
+            # image = PIL.Image.fromarray(image)
+            image = PIL.Image.open(io.BytesIO(image_data))
+            input_data["image"] = image_processor(image)
+        length_params: LengthParam = {
+            "max_length": cfg.inference.tokens_to_generate,
+            "min_length": cfg.inference.min_tokens_to_generate,
+        }
+        sampling_params: SamplingParam = {
+            "use_greedy": cfg.inference.greedy,
+            "temperature": cfg.inference.temperature,
+            "top_k": cfg.inference.top_k,
+            "top_p": cfg.inference.top_p,
+            "repetition_penalty": cfg.inference.repetition_penalty,
+            "add_BOS": cfg.inference.add_BOS,
+            "all_probs": cfg.inference.all_probs,
+            "compute_logprob": cfg.inference.compute_logprob,
+            "end_strings": cfg.inference.end_strings,
+        }
+        # Generate model responses
+        responses = model.generate(
+            input_prompts=[input_data],  # Adjust based on your model's requirements
+            length_params=length_params,  # Define these parameters as in your original code
+            sampling_params=sampling_params,  # Define these parameters as in your original code
+            inference_config=cfg,
+        )
+        return responses[0]["clean_response"]
+    iface = gr.Interface(
+        fn=predict,
+        inputs=[gr.Textbox(), gr.Textbox()],
+        outputs="text",
+        title="Multimodal Model Inference",
+        description="Enter a prompt and optionally upload an image for model inference.",
+    )
+    if __name__ == "__main__":
+        iface.launch(server_port=8890, share=False)

0 comments on commit `bb575b7`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `bb575b7`

Commit

There are no files selected for viewing

0 comments on commit bb575b7

0 comments on commit `bb575b7`