Changes to enable CUDA graph for LLM #8751

vasunvidia · 2024-03-26T20:57:59Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

ericharper · 2024-04-05T23:41:58Z

jenkins

timmoon10 · 2024-04-05T23:57:33Z

nemo/core/optim/distributed_adam.py

-        # Update transpose caches
-        params = set(self.parameter(fragment) for fragment in fragments)
-        for param in params:
-            if is_float8tensor(param):
-                param._reset_caches()
-                param.transpose(update_cache=True)
-                param._lazy_transpose_cache = True


This change is needed since NVIDIA/TransformerEngine#575 is changing te.Float8Tensor's API for transpose caching. See the discussion in NVIDIA/TransformerEngine#735 and NVIDIA/TransformerEngine#575.

This functionality may be restored in the future to enable a performance optimization with interleaved pipeline parallelism. In particular, the first microbatch is slower since it requires computing transposes, and the resulting load imbalances lead to GPU idling. Aligning the transposes across the pipeline parallel group helps ensure full utilization. See a similar optimization for distributed optimizer communication in NVIDIA/apex#1611 and NVIDIA/Megatron-LM@d2de701.

@vasunvidia observed no convergence when this logic was removed. I wonder if you are using the latest version of NVIDIA/TransformerEngine#575? The transpose logic is handled internally in TE, and if we do transposes here we'll do it twice.

I think we should figure out what's going on. If we wanted to update the transposes here, the logic would be:

# Update transpose caches params = set(self.parameter(fragment) for fragment in fragments) for param in params: if is_float8tensor(param): param._reset_caches() param.transpose_2d(cache=True)

ShriyaPalsamudram · 2024-04-12T14:30:35Z

jenkins

timmoon10

LGTM, although you should sign the commits to pass DCO.

See #8918 (comment).

The base branch was changed.

…es to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]>

…ut changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]>

Signed-off-by: Vasudevan Rengasamy <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]>

Signed-off-by: Vasudevan Rengasamy <[email protected]>

…eaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]>

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]>

* Changes to enable CUDA graph for LLM (NVIDIA#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Boxiang Wang <[email protected]>

* Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]>

…rategy (#9387) * Integrating mcore's DistributedDataParallel into MegatronStrategy Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Apply ddp-hooks from pytorch only when needed Signed-off-by: Marc Romeyn <[email protected]> * bugfix if using mcore distOpt with sft (#9356) * bugfix if using mcore distOpt Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * fix typo infer_seq_lenght -> infer_seq_length (#9370) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Rachitg/ag (#9083) * Rachitg/ag (#9081) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix * bugfix --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Adding the original change made for label_models (#9377) (#9378) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253) * Lazily warn about using greedy strategy instead of greedy_batch strategy. Previously, the warning would often run spuriously, since several existing code paths simply call "change_decoding_strategy()" after having first initialized a Module, rather than changing the config before initializing the Module. This can be confusing. The only problem I can see with this is that using logging inside a forward() method might interfere with some compiler toolkits like Torchscript or thunder.compile. Presumably it would be easy to add a conditional statement to avoid this statement in a compiler context if necessary. Signed-off-by: Daniel Galvez <[email protected]> Co-authored-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Update README.rst (#9393) Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes. Signed-off-by: jgerh <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * a2a fix removed tp world size and group from init (#8944) (#8952) Signed-off-by: Anmol Gupta <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add config option for FP32 embedding grads (#8953) * Add config option for FP32 embedding grads (#8946) Signed-off-by: Tim Moon <[email protected]> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Changes to enable CUDA graph for LLM (#8955) * Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e31862481216f9adf7fa584a0c0262916c935639. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Enhance Distributed Adam (#9051) * Enhance Distributed Adam (#9037) * Fix deprecated env. Signed-off-by: Wil Kong <[email protected]> * Use user desired value for distributed adam. Signed-off-by: Wil Kong <[email protected]> * Preserve memory format in parameter buffer of distributed adam. Signed-off-by: Wil Kong <[email protected]> * Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather. Signed-off-by: Wil Kong <[email protected]> * Provide API to lock SHArP tree for distributed adam within nodes. Signed-off-by: Wil Kong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Wil Kong <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390) * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Apply isort and black reformatting --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: tango4j <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Properly catch failed tests by introduction of workflow templates (#9324) * ci: Refactor tests into reusable template Signed-off-by: Oliver Koenig <[email protected]> * ci: Fix sending alerts on failure Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * disable slack Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix alerting Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * ci: Increase timeout for `L0_Unit_Tests_CPU` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * increase timeout Signed-off-by: Oliver Koenig <[email protected]> * increase timeout for `Speech_Checkpoints_tests` Signed-off-by: Oliver Koenig <[email protected]> * improve readability Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * finalize Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * add missing rm statement for `L2_PTQ_Llama2_Export_Only` Signed-off-by: Oliver Koenig <[email protected]> * all your comments are belong to us Signed-off-by: Oliver Koenig <[email protected]> * remove github output Signed-off-by: Oliver Koenig <[email protected]> * revive more comments Signed-off-by: Oliver Koenig <[email protected]> * add L2: ASR dev run - part two Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix T5 G2P Input and Output Types (#9224) (#9269) * fix t5 g2p model * Apply isort and black reformatting --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198) * Fix the "cast ping pong" problem when we run AMP inference. This has been tested only for Parakeet-CTC-1.1B right now. This problem certainly exists elsewhere. Automatic mixed precision and inference do not play well together. First, automatic mixed precision was created back when neural networks were much simpler. In particular, they did not have softmax and layer norm as frequent operations. In the era of transformers, softmax and layer norm are very common. AMP will uncoditionally output fp32 outputs from these operations, even if their inputs are fp16. See here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32 This is no longer necessary, now that layer norm does accumulation in fp32 in pytorch, even if the input is fp16: https://github.com/pytorch/pytorch/issues/66707 Do infernece by casting model to bfloat16, not by using AMP. Do feature preprocessing in float32 for accuracy. Warn if someone tries to input a non-float32 tensor. Always create the output in the type the rest of the model expects. Sort manifests by duration. Signed-off-by: Daniel Galvez <[email protected]> * Always cast softmax inputs to float32 when in training mode. While we don't need this for accurate results in b/float16, this is a safety precaution to make sure that training accuracy does not regress. Signed-off-by: Daniel Galvez <[email protected]> --------- Signed-off-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Huvu/rag pipeline citest (#9384) * huvu/NeMo_rag_citest first commit * adding llama-index to dependency * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjusting data/models path in ci-test to dependency * putting llama-index to optional * update cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <[email protected]> * Re-org export code (#9353) * reorg the export code Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * replaced log with raise Signed-off-by: Onur Yilmaz <[email protected]> * add converter and loader folders Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_ckpt_convert into the converter folder Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_file into loader folder Signed-off-by: Onur Yilmaz <[email protected]> * reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg Signed-off-by: Onur Yilmaz <[email protected]> * move nemo file back into nemo folder Signed-off-by: Onur Yilmaz <[email protected]> * renamed nemo folder to nemo_ckpt_loader Signed-off-by: Onur Yilmaz <[email protected]> * remove unused function Signed-off-by: Onur Yilmaz <[email protected]> * removed nemo file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * moved a function to tensorrt_llm_run file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * Remove unused imports Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * import csv added Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * disable overlap for qkv (#9079) * disable overlap for qkv (#9072) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix circular import for MM dataprep notebook (#9287) (#9292) * update launcher name and fix mm circular import * Apply isort and black reformatting --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * add check if num layers is divisible by pp size (#9208) (#9298) * add check if num_layers % pp == 0 * Apply isort and black reformatting * move num_layers / pp check to build_transformer_config --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add HF siglip vision encoder (#9185) * temp save Signed-off-by: yaoyu-33 <[email protected]> * temp save 2 Signed-off-by: yaoyu-33 <[email protected]> * update code Signed-off-by: yaoyu-33 <[email protected]> * enable seq packing Signed-off-by: yaoyu-33 <[email protected]> * fix neva and clip Signed-off-by: yaoyu-33 <[email protected]> * Enable parallel seq packing algo and few other fixes Signed-off-by: yaoyu-33 <[email protected]> * Pipeline parallel support Signed-off-by: yaoyu-33 <[email protected]> * Update data preprocess Signed-off-by: yaoyu-33 <[email protected]> * fix few pp issues Signed-off-by: yaoyu-33 <[email protected]> * enable sequence packing w/ PP Signed-off-by: yaoyu-33 <[email protected]> * Fix cu_seqlens in inputs Signed-off-by: yaoyu-33 <[email protected]> * add assert Signed-off-by: yaoyu-33 <[email protected]> * Depend on PP to decide whether do padding Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add docstring Signed-off-by: yaoyu-33 <[email protected]> * Fix few evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Fix few PP evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Address comments Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add llama3 template Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * Fix license Signed-off-by: yaoyu-33 <[email protected]> * Fix llama3 Signed-off-by: yaoyu-33 <[email protected]> * Few fixes Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * llama3 inference fix Signed-off-by: yaoyu-33 <[email protected]> * Force vision encoder to run in fp32 Signed-off-by: yaoyu-33 <[email protected]> * Revert "Force vision encoder to run in fp32" This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Try adding distributed format of checkpoint Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Allow dist checkpoint to be non-strict Signed-off-by: yaoyu-33 <[email protected]> * Fix Signed-off-by: yaoyu-33 <[email protected]> * Some fixes for PP + dist ckpt in Neva Signed-off-by: yaoyu-33 <[email protected]> * fix peft Signed-off-by: yaoyu-33 <[email protected]> * few fixes for lora Signed-off-by: yaoyu-33 <[email protected]> * checkpoint updates Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * bug fix Signed-off-by: yaoyu-33 <[email protected]> * Add HF siglip vision encoder Signed-off-by: HuiyingLi <[email protected]> * handle steerlm label in nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * Add neva dist checkpoint converter Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix CLEAN RESPONSE logic to not use last EOS Signed-off-by: HuiyingLi <[email protected]> * strip extra_id_1 from clean response Signed-off-by: HuiyingLi <[email protected]> * change inference time image processor Signed-off-by: HuiyingLi <[email protected]> * resolve comments Signed-off-by: yaoyu-33 <[email protected]> * remove open_clip vision encoder for siglip Signed-off-by: HuiyingLi <[email protected]> * update neva dist ckpt apis Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix return Signed-off-by: yaoyu-33 <[email protected]> * resolve CLEAN RESPONSE multiturn issue Signed-off-by: HuiyingLi <[email protected]> * code format Signed-off-by: HuiyingLi <[email protected]> * fixes for isort Signed-off-by: HuiyingLi <[email protected]> * refac image processor loading to util Signed-off-by: HuiyingLi <[email protected]> * black and isort Signed-off-by: HuiyingLi <[email protected]> * move crop size assertion Signed-off-by: HuiyingLi <[email protected]> * few neva fixes Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * [Nemo CICD] timeouts fix (#9407) * timeouts fix * timeouts fix Signed-off-by: Marc Romeyn <[email protected]> * Removing un-used ModelConfig class (#9389) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169) * Fixes * Docs fix * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * Add support for sharded NeMo manifest files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support megatron_amp_O2 Signed-off-by: zhehuaichen <[email protected]> * Support heterogeneous sampling rates in non tarred NeMo manifests * migrate to PTL2.0 Signed-off-by: stevehuang52 <[email protected]> * clean up Signed-off-by: stevehuang52 <[email protected]> * update manifest util Signed-off-by: stevehuang52 <[email protected]> * Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * agg and normal tokenizers actually work * Support weights for NeMo tarred manifests * Temporarily hardcoded pnc stripping/lowercasing * fix * make pnc hack configurable from the config and disabled by default * fix the hack * migrate to ptl2.1 to support multiple dataloaders Signed-off-by: stevehuang52 <[email protected]> * support encoder overwrite Signed-off-by: zhehuaichen <[email protected]> * update misc Signed-off-by: stevehuang52 <[email protected]> * fix eval and clean up Signed-off-by: stevehuang52 <[email protected]> * support add_sep for perception model Signed-off-by: zhehuaichen <[email protected]> * fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803 Signed-off-by: zhehuaichen <[email protected]> * add_bos Signed-off-by: zhehuaichen <[email protected]> * Transformer decoder with conditioning for canary (#8091) * initial commit for multi-task conf-enc transf-dec for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing decoder states caching during training Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option to limit the number of open streams (#8095) * audio signal support in multi Signed-off-by: zhehuaichen <[email protected]> * update asr evaluator Signed-off-by: stevehuang52 <[email protected]> * fix from https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397 and https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa Signed-off-by: zhehuaichen <[email protected]> * transcribe fn for Canary models (#8110) * improve readability Signed-off-by: Krishna Puvvada <[email protected]> * adding context in transcribe function for ConfTransfModels Signed-off-by: Krishna Puvvada <[email protected]> * supporting relative paths in transcribe function for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * update for eval Signed-off-by: stevehuang52 <[email protected]> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * fix bleu Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Add missing audio_filepath validation for Canary (#8119) * Add missing audio_filepath validation for Canary * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add default concat_sampling_probabilities Signed-off-by: zhehuaichen <[email protected]> * support lhotse dataset in speechllm Signed-off-by: zhehuaichen <[email protected]> * bypass get_iterator_k_split Signed-off-by: zhehuaichen <[email protected]> * tmp fix Signed-off-by: zhehuaichen <[email protected]> * try to use fixed batch with megatron Signed-off-by: zhehuaichen <[email protected]> * add batch logging Signed-off-by: zhehuaichen <[email protected]> * support unfrozen llm Signed-off-by: zhehuaichen <[email protected]> * Create README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * rename Signed-off-by: stevehuang52 <[email protected]> * add llama prompt template Signed-off-by: zhehuaichen <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * support sample alpha Signed-off-by: zhehuaichen <[email protected]> * support lhotse validation set and canary pretrained ckpt with pseudo label Signed-off-by: zhehuaichen <[email protected]> * make sure backward compatibility Signed-off-by: zhehuaichen <[email protected]> * remove pad Signed-off-by: zhehuaichen <[email protected]> * make sure asr_model is frozen Signed-off-by: zhehuaichen <[email protected]> * support greedy decoding Signed-off-by: zhehuaichen <[email protected]> * valid on lhotse Signed-off-by: zhehuaichen <[email protected]> * fix multi dataloader in val case for lhotse SALM; add default data names; keep asr model tokenizer by default to enable adding canary dataset Signed-off-by: zhehuaichen <[email protected]> * remove the bruteforce _keep_special_tokens implementation Signed-off-by: zhehuaichen <[email protected]> * decoding_ratio and convert_canary_prompt_to_text support Signed-off-by: zhehuaichen <[email protected]> * canary_tokens_augment_ratio Signed-off-by: zhehuaichen <[email protected]> * debug Signed-off-by: zhehuaichen <[email protected]> * bug fix Signed-off-by: zhehuaichen <[email protected]> * fix lhotse based eval of llama canary model Signed-off-by: zhehuaichen <[email protected]> * support some overwrite for eval Signed-off-by: zhehuaichen <[email protected]> * support zero shot prompt in training Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * fix for batch train/valid of cross Signed-off-by: zhehuaichen <[email protected]> * support learnable gate and plotting Signed-off-by: zhehuaichen <[email protected]> * support using pseudo label in prompt rather than cross att Signed-off-by: zhehuaichen <[email protected]> * bug fix for perception cfg and context tokens shift Signed-off-by: zhehuaichen <[email protected]> * DentityConnectorsAdd Signed-off-by: zhehuaichen <[email protected]> * fix ckpt saving Signed-off-by: zhehuaichen <[email protected]> * Support RnnGatedCrossAttention Signed-off-by: zhehuaichen <[email protected]> * add include_ffw and fix _optimizer_param_groups for all unfrozen run Signed-off-by: zhehuaichen <[email protected]> * support grad acc when using bucket Signed-off-by: zhehuaichen <[email protected]> * support TransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ProjectTransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size Signed-off-by: zhehuaichen <[email protected]> * support question set on val without canary Signed-off-by: zhehuaichen <[email protected]> * support load_audio_encoder and wip in optim_param_groups Signed-off-by: zhehuaichen <[email protected]> * minor fix for audio pretrain model init Signed-off-by: zhehuaichen <[email protected]> * simplify canary_tokens_augment Signed-off-by: zhehuaichen <[email protected]> * use question in the manifest if it exists Signed-off-by: zhehuaichen <[email protected]> * support dataset weighting for non tar Signed-off-by: zhehuaichen <[email protected]> * Update SpeechLLM code (#8475) * add pleasefixme marker for potential failed nightly tests. (#7678) Signed-off-by: Xuesong Yang <[email protected]> * Add new text segmentation library for better TTS quality (#7645) * Add new text segmentation library for better TTS quality * Update zh_cn_pinyin.py added detailed instruction on how to install pkuseg. Signed-off-by: Xuesong Yang <[email protected]> * Update requirements_tts.txt remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774) * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer * Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add '32-true' for precision values --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support …

* Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Jan Lasek <[email protected]>

…rategy (#9387) * Integrating mcore's DistributedDataParallel into MegatronStrategy Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Apply ddp-hooks from pytorch only when needed Signed-off-by: Marc Romeyn <[email protected]> * bugfix if using mcore distOpt with sft (#9356) * bugfix if using mcore distOpt Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * fix typo infer_seq_lenght -> infer_seq_length (#9370) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Rachitg/ag (#9083) * Rachitg/ag (#9081) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix * bugfix --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Adding the original change made for label_models (#9377) (#9378) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253) * Lazily warn about using greedy strategy instead of greedy_batch strategy. Previously, the warning would often run spuriously, since several existing code paths simply call "change_decoding_strategy()" after having first initialized a Module, rather than changing the config before initializing the Module. This can be confusing. The only problem I can see with this is that using logging inside a forward() method might interfere with some compiler toolkits like Torchscript or thunder.compile. Presumably it would be easy to add a conditional statement to avoid this statement in a compiler context if necessary. Signed-off-by: Daniel Galvez <[email protected]> Co-authored-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Update README.rst (#9393) Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes. Signed-off-by: jgerh <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * a2a fix removed tp world size and group from init (#8944) (#8952) Signed-off-by: Anmol Gupta <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add config option for FP32 embedding grads (#8953) * Add config option for FP32 embedding grads (#8946) Signed-off-by: Tim Moon <[email protected]> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Changes to enable CUDA graph for LLM (#8955) * Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e31862481216f9adf7fa584a0c0262916c935639. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Enhance Distributed Adam (#9051) * Enhance Distributed Adam (#9037) * Fix deprecated env. Signed-off-by: Wil Kong <[email protected]> * Use user desired value for distributed adam. Signed-off-by: Wil Kong <[email protected]> * Preserve memory format in parameter buffer of distributed adam. Signed-off-by: Wil Kong <[email protected]> * Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather. Signed-off-by: Wil Kong <[email protected]> * Provide API to lock SHArP tree for distributed adam within nodes. Signed-off-by: Wil Kong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Wil Kong <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390) * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Apply isort and black reformatting --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: tango4j <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Properly catch failed tests by introduction of workflow templates (#9324) * ci: Refactor tests into reusable template Signed-off-by: Oliver Koenig <[email protected]> * ci: Fix sending alerts on failure Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * disable slack Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix alerting Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * ci: Increase timeout for `L0_Unit_Tests_CPU` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * increase timeout Signed-off-by: Oliver Koenig <[email protected]> * increase timeout for `Speech_Checkpoints_tests` Signed-off-by: Oliver Koenig <[email protected]> * improve readability Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * finalize Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * add missing rm statement for `L2_PTQ_Llama2_Export_Only` Signed-off-by: Oliver Koenig <[email protected]> * all your comments are belong to us Signed-off-by: Oliver Koenig <[email protected]> * remove github output Signed-off-by: Oliver Koenig <[email protected]> * revive more comments Signed-off-by: Oliver Koenig <[email protected]> * add L2: ASR dev run - part two Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix T5 G2P Input and Output Types (#9224) (#9269) * fix t5 g2p model * Apply isort and black reformatting --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198) * Fix the "cast ping pong" problem when we run AMP inference. This has been tested only for Parakeet-CTC-1.1B right now. This problem certainly exists elsewhere. Automatic mixed precision and inference do not play well together. First, automatic mixed precision was created back when neural networks were much simpler. In particular, they did not have softmax and layer norm as frequent operations. In the era of transformers, softmax and layer norm are very common. AMP will uncoditionally output fp32 outputs from these operations, even if their inputs are fp16. See here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32 This is no longer necessary, now that layer norm does accumulation in fp32 in pytorch, even if the input is fp16: https://github.com/pytorch/pytorch/issues/66707 Do infernece by casting model to bfloat16, not by using AMP. Do feature preprocessing in float32 for accuracy. Warn if someone tries to input a non-float32 tensor. Always create the output in the type the rest of the model expects. Sort manifests by duration. Signed-off-by: Daniel Galvez <[email protected]> * Always cast softmax inputs to float32 when in training mode. While we don't need this for accurate results in b/float16, this is a safety precaution to make sure that training accuracy does not regress. Signed-off-by: Daniel Galvez <[email protected]> --------- Signed-off-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Huvu/rag pipeline citest (#9384) * huvu/NeMo_rag_citest first commit * adding llama-index to dependency * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjusting data/models path in ci-test to dependency * putting llama-index to optional * update cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <[email protected]> * Re-org export code (#9353) * reorg the export code Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * replaced log with raise Signed-off-by: Onur Yilmaz <[email protected]> * add converter and loader folders Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_ckpt_convert into the converter folder Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_file into loader folder Signed-off-by: Onur Yilmaz <[email protected]> * reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg Signed-off-by: Onur Yilmaz <[email protected]> * move nemo file back into nemo folder Signed-off-by: Onur Yilmaz <[email protected]> * renamed nemo folder to nemo_ckpt_loader Signed-off-by: Onur Yilmaz <[email protected]> * remove unused function Signed-off-by: Onur Yilmaz <[email protected]> * removed nemo file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * moved a function to tensorrt_llm_run file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * Remove unused imports Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * import csv added Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * disable overlap for qkv (#9079) * disable overlap for qkv (#9072) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix circular import for MM dataprep notebook (#9287) (#9292) * update launcher name and fix mm circular import * Apply isort and black reformatting --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * add check if num layers is divisible by pp size (#9208) (#9298) * add check if num_layers % pp == 0 * Apply isort and black reformatting * move num_layers / pp check to build_transformer_config --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add HF siglip vision encoder (#9185) * temp save Signed-off-by: yaoyu-33 <[email protected]> * temp save 2 Signed-off-by: yaoyu-33 <[email protected]> * update code Signed-off-by: yaoyu-33 <[email protected]> * enable seq packing Signed-off-by: yaoyu-33 <[email protected]> * fix neva and clip Signed-off-by: yaoyu-33 <[email protected]> * Enable parallel seq packing algo and few other fixes Signed-off-by: yaoyu-33 <[email protected]> * Pipeline parallel support Signed-off-by: yaoyu-33 <[email protected]> * Update data preprocess Signed-off-by: yaoyu-33 <[email protected]> * fix few pp issues Signed-off-by: yaoyu-33 <[email protected]> * enable sequence packing w/ PP Signed-off-by: yaoyu-33 <[email protected]> * Fix cu_seqlens in inputs Signed-off-by: yaoyu-33 <[email protected]> * add assert Signed-off-by: yaoyu-33 <[email protected]> * Depend on PP to decide whether do padding Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add docstring Signed-off-by: yaoyu-33 <[email protected]> * Fix few evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Fix few PP evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Address comments Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add llama3 template Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * Fix license Signed-off-by: yaoyu-33 <[email protected]> * Fix llama3 Signed-off-by: yaoyu-33 <[email protected]> * Few fixes Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * llama3 inference fix Signed-off-by: yaoyu-33 <[email protected]> * Force vision encoder to run in fp32 Signed-off-by: yaoyu-33 <[email protected]> * Revert "Force vision encoder to run in fp32" This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Try adding distributed format of checkpoint Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Allow dist checkpoint to be non-strict Signed-off-by: yaoyu-33 <[email protected]> * Fix Signed-off-by: yaoyu-33 <[email protected]> * Some fixes for PP + dist ckpt in Neva Signed-off-by: yaoyu-33 <[email protected]> * fix peft Signed-off-by: yaoyu-33 <[email protected]> * few fixes for lora Signed-off-by: yaoyu-33 <[email protected]> * checkpoint updates Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * bug fix Signed-off-by: yaoyu-33 <[email protected]> * Add HF siglip vision encoder Signed-off-by: HuiyingLi <[email protected]> * handle steerlm label in nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * Add neva dist checkpoint converter Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix CLEAN RESPONSE logic to not use last EOS Signed-off-by: HuiyingLi <[email protected]> * strip extra_id_1 from clean response Signed-off-by: HuiyingLi <[email protected]> * change inference time image processor Signed-off-by: HuiyingLi <[email protected]> * resolve comments Signed-off-by: yaoyu-33 <[email protected]> * remove open_clip vision encoder for siglip Signed-off-by: HuiyingLi <[email protected]> * update neva dist ckpt apis Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix return Signed-off-by: yaoyu-33 <[email protected]> * resolve CLEAN RESPONSE multiturn issue Signed-off-by: HuiyingLi <[email protected]> * code format Signed-off-by: HuiyingLi <[email protected]> * fixes for isort Signed-off-by: HuiyingLi <[email protected]> * refac image processor loading to util Signed-off-by: HuiyingLi <[email protected]> * black and isort Signed-off-by: HuiyingLi <[email protected]> * move crop size assertion Signed-off-by: HuiyingLi <[email protected]> * few neva fixes Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * [Nemo CICD] timeouts fix (#9407) * timeouts fix * timeouts fix Signed-off-by: Marc Romeyn <[email protected]> * Removing un-used ModelConfig class (#9389) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169) * Fixes * Docs fix * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * Add support for sharded NeMo manifest files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support megatron_amp_O2 Signed-off-by: zhehuaichen <[email protected]> * Support heterogeneous sampling rates in non tarred NeMo manifests * migrate to PTL2.0 Signed-off-by: stevehuang52 <[email protected]> * clean up Signed-off-by: stevehuang52 <[email protected]> * update manifest util Signed-off-by: stevehuang52 <[email protected]> * Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * agg and normal tokenizers actually work * Support weights for NeMo tarred manifests * Temporarily hardcoded pnc stripping/lowercasing * fix * make pnc hack configurable from the config and disabled by default * fix the hack * migrate to ptl2.1 to support multiple dataloaders Signed-off-by: stevehuang52 <[email protected]> * support encoder overwrite Signed-off-by: zhehuaichen <[email protected]> * update misc Signed-off-by: stevehuang52 <[email protected]> * fix eval and clean up Signed-off-by: stevehuang52 <[email protected]> * support add_sep for perception model Signed-off-by: zhehuaichen <[email protected]> * fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803 Signed-off-by: zhehuaichen <[email protected]> * add_bos Signed-off-by: zhehuaichen <[email protected]> * Transformer decoder with conditioning for canary (#8091) * initial commit for multi-task conf-enc transf-dec for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing decoder states caching during training Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option to limit the number of open streams (#8095) * audio signal support in multi Signed-off-by: zhehuaichen <[email protected]> * update asr evaluator Signed-off-by: stevehuang52 <[email protected]> * fix from https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397 and https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa Signed-off-by: zhehuaichen <[email protected]> * transcribe fn for Canary models (#8110) * improve readability Signed-off-by: Krishna Puvvada <[email protected]> * adding context in transcribe function for ConfTransfModels Signed-off-by: Krishna Puvvada <[email protected]> * supporting relative paths in transcribe function for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * update for eval Signed-off-by: stevehuang52 <[email protected]> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * fix bleu Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Add missing audio_filepath validation for Canary (#8119) * Add missing audio_filepath validation for Canary * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add default concat_sampling_probabilities Signed-off-by: zhehuaichen <[email protected]> * support lhotse dataset in speechllm Signed-off-by: zhehuaichen <[email protected]> * bypass get_iterator_k_split Signed-off-by: zhehuaichen <[email protected]> * tmp fix Signed-off-by: zhehuaichen <[email protected]> * try to use fixed batch with megatron Signed-off-by: zhehuaichen <[email protected]> * add batch logging Signed-off-by: zhehuaichen <[email protected]> * support unfrozen llm Signed-off-by: zhehuaichen <[email protected]> * Create README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * rename Signed-off-by: stevehuang52 <[email protected]> * add llama prompt template Signed-off-by: zhehuaichen <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * support sample alpha Signed-off-by: zhehuaichen <[email protected]> * support lhotse validation set and canary pretrained ckpt with pseudo label Signed-off-by: zhehuaichen <[email protected]> * make sure backward compatibility Signed-off-by: zhehuaichen <[email protected]> * remove pad Signed-off-by: zhehuaichen <[email protected]> * make sure asr_model is frozen Signed-off-by: zhehuaichen <[email protected]> * support greedy decoding Signed-off-by: zhehuaichen <[email protected]> * valid on lhotse Signed-off-by: zhehuaichen <[email protected]> * fix multi dataloader in val case for lhotse SALM; add default data names; keep asr model tokenizer by default to enable adding canary dataset Signed-off-by: zhehuaichen <[email protected]> * remove the bruteforce _keep_special_tokens implementation Signed-off-by: zhehuaichen <[email protected]> * decoding_ratio and convert_canary_prompt_to_text support Signed-off-by: zhehuaichen <[email protected]> * canary_tokens_augment_ratio Signed-off-by: zhehuaichen <[email protected]> * debug Signed-off-by: zhehuaichen <[email protected]> * bug fix Signed-off-by: zhehuaichen <[email protected]> * fix lhotse based eval of llama canary model Signed-off-by: zhehuaichen <[email protected]> * support some overwrite for eval Signed-off-by: zhehuaichen <[email protected]> * support zero shot prompt in training Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * fix for batch train/valid of cross Signed-off-by: zhehuaichen <[email protected]> * support learnable gate and plotting Signed-off-by: zhehuaichen <[email protected]> * support using pseudo label in prompt rather than cross att Signed-off-by: zhehuaichen <[email protected]> * bug fix for perception cfg and context tokens shift Signed-off-by: zhehuaichen <[email protected]> * DentityConnectorsAdd Signed-off-by: zhehuaichen <[email protected]> * fix ckpt saving Signed-off-by: zhehuaichen <[email protected]> * Support RnnGatedCrossAttention Signed-off-by: zhehuaichen <[email protected]> * add include_ffw and fix _optimizer_param_groups for all unfrozen run Signed-off-by: zhehuaichen <[email protected]> * support grad acc when using bucket Signed-off-by: zhehuaichen <[email protected]> * support TransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ProjectTransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size Signed-off-by: zhehuaichen <[email protected]> * support question set on val without canary Signed-off-by: zhehuaichen <[email protected]> * support load_audio_encoder and wip in optim_param_groups Signed-off-by: zhehuaichen <[email protected]> * minor fix for audio pretrain model init Signed-off-by: zhehuaichen <[email protected]> * simplify canary_tokens_augment Signed-off-by: zhehuaichen <[email protected]> * use question in the manifest if it exists Signed-off-by: zhehuaichen <[email protected]> * support dataset weighting for non tar Signed-off-by: zhehuaichen <[email protected]> * Update SpeechLLM code (#8475) * add pleasefixme marker for potential failed nightly tests. (#7678) Signed-off-by: Xuesong Yang <[email protected]> * Add new text segmentation library for better TTS quality (#7645) * Add new text segmentation library for better TTS quality * Update zh_cn_pinyin.py added detailed instruction on how to install pkuseg. Signed-off-by: Xuesong Yang <[email protected]> * Update requirements_tts.txt remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774) * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer * Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add '32-true' for precision values --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support …

…rategy (NVIDIA#9387) * Integrating mcore's DistributedDataParallel into MegatronStrategy Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Apply ddp-hooks from pytorch only when needed Signed-off-by: Marc Romeyn <[email protected]> * bugfix if using mcore distOpt with sft (#9356) * bugfix if using mcore distOpt Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * fix typo infer_seq_lenght -> infer_seq_length (#9370) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Rachitg/ag (#9083) * Rachitg/ag (#9081) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix * bugfix --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Adding the original change made for label_models (#9377) (#9378) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253) * Lazily warn about using greedy strategy instead of greedy_batch strategy. Previously, the warning would often run spuriously, since several existing code paths simply call "change_decoding_strategy()" after having first initialized a Module, rather than changing the config before initializing the Module. This can be confusing. The only problem I can see with this is that using logging inside a forward() method might interfere with some compiler toolkits like Torchscript or thunder.compile. Presumably it would be easy to add a conditional statement to avoid this statement in a compiler context if necessary. Signed-off-by: Daniel Galvez <[email protected]> Co-authored-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Update README.rst (#9393) Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes. Signed-off-by: jgerh <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * a2a fix removed tp world size and group from init (#8944) (#8952) Signed-off-by: Anmol Gupta <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add config option for FP32 embedding grads (#8953) * Add config option for FP32 embedding grads (#8946) Signed-off-by: Tim Moon <[email protected]> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Changes to enable CUDA graph for LLM (#8955) * Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e31862481216f9adf7fa584a0c0262916c935639. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Enhance Distributed Adam (#9051) * Enhance Distributed Adam (#9037) * Fix deprecated env. Signed-off-by: Wil Kong <[email protected]> * Use user desired value for distributed adam. Signed-off-by: Wil Kong <[email protected]> * Preserve memory format in parameter buffer of distributed adam. Signed-off-by: Wil Kong <[email protected]> * Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather. Signed-off-by: Wil Kong <[email protected]> * Provide API to lock SHArP tree for distributed adam within nodes. Signed-off-by: Wil Kong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Wil Kong <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390) * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Apply isort and black reformatting --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: tango4j <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Properly catch failed tests by introduction of workflow templates (#9324) * ci: Refactor tests into reusable template Signed-off-by: Oliver Koenig <[email protected]> * ci: Fix sending alerts on failure Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * disable slack Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix alerting Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * ci: Increase timeout for `L0_Unit_Tests_CPU` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * increase timeout Signed-off-by: Oliver Koenig <[email protected]> * increase timeout for `Speech_Checkpoints_tests` Signed-off-by: Oliver Koenig <[email protected]> * improve readability Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * finalize Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * add missing rm statement for `L2_PTQ_Llama2_Export_Only` Signed-off-by: Oliver Koenig <[email protected]> * all your comments are belong to us Signed-off-by: Oliver Koenig <[email protected]> * remove github output Signed-off-by: Oliver Koenig <[email protected]> * revive more comments Signed-off-by: Oliver Koenig <[email protected]> * add L2: ASR dev run - part two Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix T5 G2P Input and Output Types (#9224) (#9269) * fix t5 g2p model * Apply isort and black reformatting --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198) * Fix the "cast ping pong" problem when we run AMP inference. This has been tested only for Parakeet-CTC-1.1B right now. This problem certainly exists elsewhere. Automatic mixed precision and inference do not play well together. First, automatic mixed precision was created back when neural networks were much simpler. In particular, they did not have softmax and layer norm as frequent operations. In the era of transformers, softmax and layer norm are very common. AMP will uncoditionally output fp32 outputs from these operations, even if their inputs are fp16. See here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32 This is no longer necessary, now that layer norm does accumulation in fp32 in pytorch, even if the input is fp16: https://github.com/pytorch/pytorch/issues/66707 Do infernece by casting model to bfloat16, not by using AMP. Do feature preprocessing in float32 for accuracy. Warn if someone tries to input a non-float32 tensor. Always create the output in the type the rest of the model expects. Sort manifests by duration. Signed-off-by: Daniel Galvez <[email protected]> * Always cast softmax inputs to float32 when in training mode. While we don't need this for accurate results in b/float16, this is a safety precaution to make sure that training accuracy does not regress. Signed-off-by: Daniel Galvez <[email protected]> --------- Signed-off-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Huvu/rag pipeline citest (#9384) * huvu/NeMo_rag_citest first commit * adding llama-index to dependency * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjusting data/models path in ci-test to dependency * putting llama-index to optional * update cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <[email protected]> * Re-org export code (#9353) * reorg the export code Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * replaced log with raise Signed-off-by: Onur Yilmaz <[email protected]> * add converter and loader folders Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_ckpt_convert into the converter folder Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_file into loader folder Signed-off-by: Onur Yilmaz <[email protected]> * reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg Signed-off-by: Onur Yilmaz <[email protected]> * move nemo file back into nemo folder Signed-off-by: Onur Yilmaz <[email protected]> * renamed nemo folder to nemo_ckpt_loader Signed-off-by: Onur Yilmaz <[email protected]> * remove unused function Signed-off-by: Onur Yilmaz <[email protected]> * removed nemo file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * moved a function to tensorrt_llm_run file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * Remove unused imports Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * import csv added Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * disable overlap for qkv (#9079) * disable overlap for qkv (#9072) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix circular import for MM dataprep notebook (#9287) (#9292) * update launcher name and fix mm circular import * Apply isort and black reformatting --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * add check if num layers is divisible by pp size (#9208) (#9298) * add check if num_layers % pp == 0 * Apply isort and black reformatting * move num_layers / pp check to build_transformer_config --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add HF siglip vision encoder (#9185) * temp save Signed-off-by: yaoyu-33 <[email protected]> * temp save 2 Signed-off-by: yaoyu-33 <[email protected]> * update code Signed-off-by: yaoyu-33 <[email protected]> * enable seq packing Signed-off-by: yaoyu-33 <[email protected]> * fix neva and clip Signed-off-by: yaoyu-33 <[email protected]> * Enable parallel seq packing algo and few other fixes Signed-off-by: yaoyu-33 <[email protected]> * Pipeline parallel support Signed-off-by: yaoyu-33 <[email protected]> * Update data preprocess Signed-off-by: yaoyu-33 <[email protected]> * fix few pp issues Signed-off-by: yaoyu-33 <[email protected]> * enable sequence packing w/ PP Signed-off-by: yaoyu-33 <[email protected]> * Fix cu_seqlens in inputs Signed-off-by: yaoyu-33 <[email protected]> * add assert Signed-off-by: yaoyu-33 <[email protected]> * Depend on PP to decide whether do padding Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add docstring Signed-off-by: yaoyu-33 <[email protected]> * Fix few evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Fix few PP evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Address comments Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add llama3 template Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * Fix license Signed-off-by: yaoyu-33 <[email protected]> * Fix llama3 Signed-off-by: yaoyu-33 <[email protected]> * Few fixes Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * llama3 inference fix Signed-off-by: yaoyu-33 <[email protected]> * Force vision encoder to run in fp32 Signed-off-by: yaoyu-33 <[email protected]> * Revert "Force vision encoder to run in fp32" This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Try adding distributed format of checkpoint Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Allow dist checkpoint to be non-strict Signed-off-by: yaoyu-33 <[email protected]> * Fix Signed-off-by: yaoyu-33 <[email protected]> * Some fixes for PP + dist ckpt in Neva Signed-off-by: yaoyu-33 <[email protected]> * fix peft Signed-off-by: yaoyu-33 <[email protected]> * few fixes for lora Signed-off-by: yaoyu-33 <[email protected]> * checkpoint updates Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * bug fix Signed-off-by: yaoyu-33 <[email protected]> * Add HF siglip vision encoder Signed-off-by: HuiyingLi <[email protected]> * handle steerlm label in nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * Add neva dist checkpoint converter Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix CLEAN RESPONSE logic to not use last EOS Signed-off-by: HuiyingLi <[email protected]> * strip extra_id_1 from clean response Signed-off-by: HuiyingLi <[email protected]> * change inference time image processor Signed-off-by: HuiyingLi <[email protected]> * resolve comments Signed-off-by: yaoyu-33 <[email protected]> * remove open_clip vision encoder for siglip Signed-off-by: HuiyingLi <[email protected]> * update neva dist ckpt apis Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix return Signed-off-by: yaoyu-33 <[email protected]> * resolve CLEAN RESPONSE multiturn issue Signed-off-by: HuiyingLi <[email protected]> * code format Signed-off-by: HuiyingLi <[email protected]> * fixes for isort Signed-off-by: HuiyingLi <[email protected]> * refac image processor loading to util Signed-off-by: HuiyingLi <[email protected]> * black and isort Signed-off-by: HuiyingLi <[email protected]> * move crop size assertion Signed-off-by: HuiyingLi <[email protected]> * few neva fixes Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * [Nemo CICD] timeouts fix (#9407) * timeouts fix * timeouts fix Signed-off-by: Marc Romeyn <[email protected]> * Removing un-used ModelConfig class (#9389) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169) * Fixes * Docs fix * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * Add support for sharded NeMo manifest files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support megatron_amp_O2 Signed-off-by: zhehuaichen <[email protected]> * Support heterogeneous sampling rates in non tarred NeMo manifests * migrate to PTL2.0 Signed-off-by: stevehuang52 <[email protected]> * clean up Signed-off-by: stevehuang52 <[email protected]> * update manifest util Signed-off-by: stevehuang52 <[email protected]> * Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * agg and normal tokenizers actually work * Support weights for NeMo tarred manifests * Temporarily hardcoded pnc stripping/lowercasing * fix * make pnc hack configurable from the config and disabled by default * fix the hack * migrate to ptl2.1 to support multiple dataloaders Signed-off-by: stevehuang52 <[email protected]> * support encoder overwrite Signed-off-by: zhehuaichen <[email protected]> * update misc Signed-off-by: stevehuang52 <[email protected]> * fix eval and clean up Signed-off-by: stevehuang52 <[email protected]> * support add_sep for perception model Signed-off-by: zhehuaichen <[email protected]> * fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803 Signed-off-by: zhehuaichen <[email protected]> * add_bos Signed-off-by: zhehuaichen <[email protected]> * Transformer decoder with conditioning for canary (#8091) * initial commit for multi-task conf-enc transf-dec for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing decoder states caching during training Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option to limit the number of open streams (#8095) * audio signal support in multi Signed-off-by: zhehuaichen <[email protected]> * update asr evaluator Signed-off-by: stevehuang52 <[email protected]> * fix from https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397 and https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa Signed-off-by: zhehuaichen <[email protected]> * transcribe fn for Canary models (#8110) * improve readability Signed-off-by: Krishna Puvvada <[email protected]> * adding context in transcribe function for ConfTransfModels Signed-off-by: Krishna Puvvada <[email protected]> * supporting relative paths in transcribe function for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * update for eval Signed-off-by: stevehuang52 <[email protected]> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * fix bleu Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Add missing audio_filepath validation for Canary (#8119) * Add missing audio_filepath validation for Canary * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add default concat_sampling_probabilities Signed-off-by: zhehuaichen <[email protected]> * support lhotse dataset in speechllm Signed-off-by: zhehuaichen <[email protected]> * bypass get_iterator_k_split Signed-off-by: zhehuaichen <[email protected]> * tmp fix Signed-off-by: zhehuaichen <[email protected]> * try to use fixed batch with megatron Signed-off-by: zhehuaichen <[email protected]> * add batch logging Signed-off-by: zhehuaichen <[email protected]> * support unfrozen llm Signed-off-by: zhehuaichen <[email protected]> * Create README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * rename Signed-off-by: stevehuang52 <[email protected]> * add llama prompt template Signed-off-by: zhehuaichen <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * support sample alpha Signed-off-by: zhehuaichen <[email protected]> * support lhotse validation set and canary pretrained ckpt with pseudo label Signed-off-by: zhehuaichen <[email protected]> * make sure backward compatibility Signed-off-by: zhehuaichen <[email protected]> * remove pad Signed-off-by: zhehuaichen <[email protected]> * make sure asr_model is frozen Signed-off-by: zhehuaichen <[email protected]> * support greedy decoding Signed-off-by: zhehuaichen <[email protected]> * valid on lhotse Signed-off-by: zhehuaichen <[email protected]> * fix multi dataloader in val case for lhotse SALM; add default data names; keep asr model tokenizer by default to enable adding canary dataset Signed-off-by: zhehuaichen <[email protected]> * remove the bruteforce _keep_special_tokens implementation Signed-off-by: zhehuaichen <[email protected]> * decoding_ratio and convert_canary_prompt_to_text support Signed-off-by: zhehuaichen <[email protected]> * canary_tokens_augment_ratio Signed-off-by: zhehuaichen <[email protected]> * debug Signed-off-by: zhehuaichen <[email protected]> * bug fix Signed-off-by: zhehuaichen <[email protected]> * fix lhotse based eval of llama canary model Signed-off-by: zhehuaichen <[email protected]> * support some overwrite for eval Signed-off-by: zhehuaichen <[email protected]> * support zero shot prompt in training Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * fix for batch train/valid of cross Signed-off-by: zhehuaichen <[email protected]> * support learnable gate and plotting Signed-off-by: zhehuaichen <[email protected]> * support using pseudo label in prompt rather than cross att Signed-off-by: zhehuaichen <[email protected]> * bug fix for perception cfg and context tokens shift Signed-off-by: zhehuaichen <[email protected]> * DentityConnectorsAdd Signed-off-by: zhehuaichen <[email protected]> * fix ckpt saving Signed-off-by: zhehuaichen <[email protected]> * Support RnnGatedCrossAttention Signed-off-by: zhehuaichen <[email protected]> * add include_ffw and fix _optimizer_param_groups for all unfrozen run Signed-off-by: zhehuaichen <[email protected]> * support grad acc when using bucket Signed-off-by: zhehuaichen <[email protected]> * support TransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ProjectTransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size Signed-off-by: zhehuaichen <[email protected]> * support question set on val without canary Signed-off-by: zhehuaichen <[email protected]> * support load_audio_encoder and wip in optim_param_groups Signed-off-by: zhehuaichen <[email protected]> * minor fix for audio pretrain model init Signed-off-by: zhehuaichen <[email protected]> * simplify canary_tokens_augment Signed-off-by: zhehuaichen <[email protected]> * use question in the manifest if it exists Signed-off-by: zhehuaichen <[email protected]> * support dataset weighting for non tar Signed-off-by: zhehuaichen <[email protected]> * Update SpeechLLM code (#8475) * add pleasefixme marker for potential failed nightly tests. (#7678) Signed-off-by: Xuesong Yang <[email protected]> * Add new text segmentation library for better TTS quality (#7645) * Add new text segmentation library for better TTS quality * Update zh_cn_pinyin.py added detailed instruction on how to install pkuseg. Signed-off-by: Xuesong Yang <[email protected]> * Update requirements_tts.txt remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774) * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer * Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add '32-true' for precision values --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support …

* Changes to enable CUDA graph for LLM (NVIDIA#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb4. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]>

…rategy (NVIDIA#9387) * Integrating mcore's DistributedDataParallel into MegatronStrategy Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Apply ddp-hooks from pytorch only when needed Signed-off-by: Marc Romeyn <[email protected]> * bugfix if using mcore distOpt with sft (#9356) * bugfix if using mcore distOpt Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * fix typo infer_seq_lenght -> infer_seq_length (#9370) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Rachitg/ag (#9083) * Rachitg/ag (#9081) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix * bugfix --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Adding the original change made for label_models (#9377) (#9378) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253) * Lazily warn about using greedy strategy instead of greedy_batch strategy. Previously, the warning would often run spuriously, since several existing code paths simply call "change_decoding_strategy()" after having first initialized a Module, rather than changing the config before initializing the Module. This can be confusing. The only problem I can see with this is that using logging inside a forward() method might interfere with some compiler toolkits like Torchscript or thunder.compile. Presumably it would be easy to add a conditional statement to avoid this statement in a compiler context if necessary. Signed-off-by: Daniel Galvez <[email protected]> Co-authored-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Update README.rst (#9393) Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes. Signed-off-by: jgerh <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * a2a fix removed tp world size and group from init (#8944) (#8952) Signed-off-by: Anmol Gupta <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add config option for FP32 embedding grads (#8953) * Add config option for FP32 embedding grads (#8946) Signed-off-by: Tim Moon <[email protected]> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Changes to enable CUDA graph for LLM (#8955) * Changes to enable CUDA graph for LLM (#8751) * Use next instead of get_batch Signed-off-by: Vasudevan Rengasamy <[email protected]> * CUDA graph changes Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change to enable CG with weight caching Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Use next instead of get_batch" This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb. Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove skip_weight_update argument Signed-off-by: Vasudevan Rengasamy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix + cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use new TE API for FP8 Param transpose Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change config param cuda_graph to enable_cuda_graph Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable TE RNGStatesTracker through config Signed-off-by: Vasudevan Rengasamy <[email protected]> * Change te_rng_tracker to use_te_rng_tracker Signed-off-by: Vasudevan Rengasamy <[email protected]> * FP8 weight transpose handled inside TE Signed-off-by: Vasudevan Rengasamy <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e31862481216f9adf7fa584a0c0262916c935639. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> * Fix merge conflicts Signed-off-by: Vasudevan Rengasamy <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: vasunvidia <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Enhance Distributed Adam (#9051) * Enhance Distributed Adam (#9037) * Fix deprecated env. Signed-off-by: Wil Kong <[email protected]> * Use user desired value for distributed adam. Signed-off-by: Wil Kong <[email protected]> * Preserve memory format in parameter buffer of distributed adam. Signed-off-by: Wil Kong <[email protected]> * Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather. Signed-off-by: Wil Kong <[email protected]> * Provide API to lock SHArP tree for distributed adam within nodes. Signed-off-by: Wil Kong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Wil Kong <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: Wil Kong <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: Wil Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390) * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Fixed clustering diarizer to load MSDD to GPU by default if cuda on * Apply isort and black reformatting --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: tango4j <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Properly catch failed tests by introduction of workflow templates (#9324) * ci: Refactor tests into reusable template Signed-off-by: Oliver Koenig <[email protected]> * ci: Fix sending alerts on failure Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * disable slack Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix alerting Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * ci: Increase timeout for `L0_Unit_Tests_CPU` Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * increase timeout Signed-off-by: Oliver Koenig <[email protected]> * increase timeout for `Speech_Checkpoints_tests` Signed-off-by: Oliver Koenig <[email protected]> * improve readability Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * finalize Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * add missing rm statement for `L2_PTQ_Llama2_Export_Only` Signed-off-by: Oliver Koenig <[email protected]> * all your comments are belong to us Signed-off-by: Oliver Koenig <[email protected]> * remove github output Signed-off-by: Oliver Koenig <[email protected]> * revive more comments Signed-off-by: Oliver Koenig <[email protected]> * add L2: ASR dev run - part two Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix T5 G2P Input and Output Types (#9224) (#9269) * fix t5 g2p model * Apply isort and black reformatting --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: blisc <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198) * Fix the "cast ping pong" problem when we run AMP inference. This has been tested only for Parakeet-CTC-1.1B right now. This problem certainly exists elsewhere. Automatic mixed precision and inference do not play well together. First, automatic mixed precision was created back when neural networks were much simpler. In particular, they did not have softmax and layer norm as frequent operations. In the era of transformers, softmax and layer norm are very common. AMP will uncoditionally output fp32 outputs from these operations, even if their inputs are fp16. See here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32 This is no longer necessary, now that layer norm does accumulation in fp32 in pytorch, even if the input is fp16: https://github.com/pytorch/pytorch/issues/66707 Do infernece by casting model to bfloat16, not by using AMP. Do feature preprocessing in float32 for accuracy. Warn if someone tries to input a non-float32 tensor. Always create the output in the type the rest of the model expects. Sort manifests by duration. Signed-off-by: Daniel Galvez <[email protected]> * Always cast softmax inputs to float32 when in training mode. While we don't need this for accurate results in b/float16, this is a safety precaution to make sure that training accuracy does not regress. Signed-off-by: Daniel Galvez <[email protected]> --------- Signed-off-by: Daniel Galvez <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Huvu/rag pipeline citest (#9384) * huvu/NeMo_rag_citest first commit * adding llama-index to dependency * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjusting data/models path in ci-test to dependency * putting llama-index to optional * update cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Marc Romeyn <[email protected]> * Re-org export code (#9353) * reorg the export code Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * replaced log with raise Signed-off-by: Onur Yilmaz <[email protected]> * add converter and loader folders Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_ckpt_convert into the converter folder Signed-off-by: Onur Yilmaz <[email protected]> * move nemo_file into loader folder Signed-off-by: Onur Yilmaz <[email protected]> * reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg converter Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * continue to reorg Signed-off-by: Onur Yilmaz <[email protected]> * move nemo file back into nemo folder Signed-off-by: Onur Yilmaz <[email protected]> * renamed nemo folder to nemo_ckpt_loader Signed-off-by: Onur Yilmaz <[email protected]> * remove unused function Signed-off-by: Onur Yilmaz <[email protected]> * removed nemo file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * moved a function to tensorrt_llm_run file Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * Remove unused imports Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * import csv added Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399) Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * disable overlap for qkv (#9079) * disable overlap for qkv (#9072) * disable overlap for qkv Signed-off-by: Rachit Garg <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: michal2409 <[email protected]> --------- Signed-off-by: Rachit Garg <[email protected]> Signed-off-by: michal2409 <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: Rachit Garg <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: michal2409 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Fix circular import for MM dataprep notebook (#9287) (#9292) * update launcher name and fix mm circular import * Apply isort and black reformatting --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * add check if num layers is divisible by pp size (#9208) (#9298) * add check if num_layers % pp == 0 * Apply isort and black reformatting * move num_layers / pp check to build_transformer_config --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Add HF siglip vision encoder (#9185) * temp save Signed-off-by: yaoyu-33 <[email protected]> * temp save 2 Signed-off-by: yaoyu-33 <[email protected]> * update code Signed-off-by: yaoyu-33 <[email protected]> * enable seq packing Signed-off-by: yaoyu-33 <[email protected]> * fix neva and clip Signed-off-by: yaoyu-33 <[email protected]> * Enable parallel seq packing algo and few other fixes Signed-off-by: yaoyu-33 <[email protected]> * Pipeline parallel support Signed-off-by: yaoyu-33 <[email protected]> * Update data preprocess Signed-off-by: yaoyu-33 <[email protected]> * fix few pp issues Signed-off-by: yaoyu-33 <[email protected]> * enable sequence packing w/ PP Signed-off-by: yaoyu-33 <[email protected]> * Fix cu_seqlens in inputs Signed-off-by: yaoyu-33 <[email protected]> * add assert Signed-off-by: yaoyu-33 <[email protected]> * Depend on PP to decide whether do padding Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add docstring Signed-off-by: yaoyu-33 <[email protected]> * Fix few evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Fix few PP evaluation issues Signed-off-by: yaoyu-33 <[email protected]> * Address comments Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add llama3 template Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * Fix license Signed-off-by: yaoyu-33 <[email protected]> * Fix llama3 Signed-off-by: yaoyu-33 <[email protected]> * Few fixes Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Few neva bugs Signed-off-by: yaoyu-33 <[email protected]> * llama3 inference fix Signed-off-by: yaoyu-33 <[email protected]> * Force vision encoder to run in fp32 Signed-off-by: yaoyu-33 <[email protected]> * Revert "Force vision encoder to run in fp32" This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Try adding distributed format of checkpoint Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Allow dist checkpoint to be non-strict Signed-off-by: yaoyu-33 <[email protected]> * Fix Signed-off-by: yaoyu-33 <[email protected]> * Some fixes for PP + dist ckpt in Neva Signed-off-by: yaoyu-33 <[email protected]> * fix peft Signed-off-by: yaoyu-33 <[email protected]> * few fixes for lora Signed-off-by: yaoyu-33 <[email protected]> * checkpoint updates Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * bug fix Signed-off-by: yaoyu-33 <[email protected]> * Add HF siglip vision encoder Signed-off-by: HuiyingLi <[email protected]> * handle steerlm label in nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * Add neva dist checkpoint converter Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix CLEAN RESPONSE logic to not use last EOS Signed-off-by: HuiyingLi <[email protected]> * strip extra_id_1 from clean response Signed-off-by: HuiyingLi <[email protected]> * change inference time image processor Signed-off-by: HuiyingLi <[email protected]> * resolve comments Signed-off-by: yaoyu-33 <[email protected]> * remove open_clip vision encoder for siglip Signed-off-by: HuiyingLi <[email protected]> * update neva dist ckpt apis Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix return Signed-off-by: yaoyu-33 <[email protected]> * resolve CLEAN RESPONSE multiturn issue Signed-off-by: HuiyingLi <[email protected]> * code format Signed-off-by: HuiyingLi <[email protected]> * fixes for isort Signed-off-by: HuiyingLi <[email protected]> * refac image processor loading to util Signed-off-by: HuiyingLi <[email protected]> * black and isort Signed-off-by: HuiyingLi <[email protected]> * move crop size assertion Signed-off-by: HuiyingLi <[email protected]> * few neva fixes Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * [Nemo CICD] timeouts fix (#9407) * timeouts fix * timeouts fix Signed-off-by: Marc Romeyn <[email protected]> * Removing un-used ModelConfig class (#9389) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> * Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169) * Fixes * Docs fix * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * support distributed_fused_adam Signed-off-by: zhehuaichen <[email protected]> * Add support for sharded NeMo manifest files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support megatron_amp_O2 Signed-off-by: zhehuaichen <[email protected]> * Support heterogeneous sampling rates in non tarred NeMo manifests * migrate to PTL2.0 Signed-off-by: stevehuang52 <[email protected]> * clean up Signed-off-by: stevehuang52 <[email protected]> * update manifest util Signed-off-by: stevehuang52 <[email protected]> * Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * agg and normal tokenizers actually work * Support weights for NeMo tarred manifests * Temporarily hardcoded pnc stripping/lowercasing * fix * make pnc hack configurable from the config and disabled by default * fix the hack * migrate to ptl2.1 to support multiple dataloaders Signed-off-by: stevehuang52 <[email protected]> * support encoder overwrite Signed-off-by: zhehuaichen <[email protected]> * update misc Signed-off-by: stevehuang52 <[email protected]> * fix eval and clean up Signed-off-by: stevehuang52 <[email protected]> * support add_sep for perception model Signed-off-by: zhehuaichen <[email protected]> * fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803 Signed-off-by: zhehuaichen <[email protected]> * add_bos Signed-off-by: zhehuaichen <[email protected]> * Transformer decoder with conditioning for canary (#8091) * initial commit for multi-task conf-enc transf-dec for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing decoder states caching during training Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option to limit the number of open streams (#8095) * audio signal support in multi Signed-off-by: zhehuaichen <[email protected]> * update asr evaluator Signed-off-by: stevehuang52 <[email protected]> * fix from https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397 and https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa Signed-off-by: zhehuaichen <[email protected]> * transcribe fn for Canary models (#8110) * improve readability Signed-off-by: Krishna Puvvada <[email protected]> * adding context in transcribe function for ConfTransfModels Signed-off-by: Krishna Puvvada <[email protected]> * supporting relative paths in transcribe function for canary Signed-off-by: Krishna Puvvada <[email protected]> * removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference Signed-off-by: Krishna Puvvada <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * update for eval Signed-off-by: stevehuang52 <[email protected]> * update for evaluation Signed-off-by: stevehuang52 <[email protected]> * fix bleu Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Add missing audio_filepath validation for Canary (#8119) * Add missing audio_filepath validation for Canary * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add default concat_sampling_probabilities Signed-off-by: zhehuaichen <[email protected]> * support lhotse dataset in speechllm Signed-off-by: zhehuaichen <[email protected]> * bypass get_iterator_k_split Signed-off-by: zhehuaichen <[email protected]> * tmp fix Signed-off-by: zhehuaichen <[email protected]> * try to use fixed batch with megatron Signed-off-by: zhehuaichen <[email protected]> * add batch logging Signed-off-by: zhehuaichen <[email protected]> * support unfrozen llm Signed-off-by: zhehuaichen <[email protected]> * Create README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * Update README.md Signed-off-by: He Huang (Steve) <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * rename Signed-off-by: stevehuang52 <[email protected]> * add llama prompt template Signed-off-by: zhehuaichen <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * support sample alpha Signed-off-by: zhehuaichen <[email protected]> * support lhotse validation set and canary pretrained ckpt with pseudo label Signed-off-by: zhehuaichen <[email protected]> * make sure backward compatibility Signed-off-by: zhehuaichen <[email protected]> * remove pad Signed-off-by: zhehuaichen <[email protected]> * make sure asr_model is frozen Signed-off-by: zhehuaichen <[email protected]> * support greedy decoding Signed-off-by: zhehuaichen <[email protected]> * valid on lhotse Signed-off-by: zhehuaichen <[email protected]> * fix multi dataloader in val case for lhotse SALM; add default data names; keep asr model tokenizer by default to enable adding canary dataset Signed-off-by: zhehuaichen <[email protected]> * remove the bruteforce _keep_special_tokens implementation Signed-off-by: zhehuaichen <[email protected]> * decoding_ratio and convert_canary_prompt_to_text support Signed-off-by: zhehuaichen <[email protected]> * canary_tokens_augment_ratio Signed-off-by: zhehuaichen <[email protected]> * debug Signed-off-by: zhehuaichen <[email protected]> * bug fix Signed-off-by: zhehuaichen <[email protected]> * fix lhotse based eval of llama canary model Signed-off-by: zhehuaichen <[email protected]> * support some overwrite for eval Signed-off-by: zhehuaichen <[email protected]> * support zero shot prompt in training Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * support cross attention based SALM Signed-off-by: zhehuaichen <[email protected]> * fix for batch train/valid of cross Signed-off-by: zhehuaichen <[email protected]> * support learnable gate and plotting Signed-off-by: zhehuaichen <[email protected]> * support using pseudo label in prompt rather than cross att Signed-off-by: zhehuaichen <[email protected]> * bug fix for perception cfg and context tokens shift Signed-off-by: zhehuaichen <[email protected]> * DentityConnectorsAdd Signed-off-by: zhehuaichen <[email protected]> * fix ckpt saving Signed-off-by: zhehuaichen <[email protected]> * Support RnnGatedCrossAttention Signed-off-by: zhehuaichen <[email protected]> * add include_ffw and fix _optimizer_param_groups for all unfrozen run Signed-off-by: zhehuaichen <[email protected]> * support grad acc when using bucket Signed-off-by: zhehuaichen <[email protected]> * support TransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ProjectTransformerCrossAttention Signed-off-by: zhehuaichen <[email protected]> * support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size Signed-off-by: zhehuaichen <[email protected]> * support question set on val without canary Signed-off-by: zhehuaichen <[email protected]> * support load_audio_encoder and wip in optim_param_groups Signed-off-by: zhehuaichen <[email protected]> * minor fix for audio pretrain model init Signed-off-by: zhehuaichen <[email protected]> * simplify canary_tokens_augment Signed-off-by: zhehuaichen <[email protected]> * use question in the manifest if it exists Signed-off-by: zhehuaichen <[email protected]> * support dataset weighting for non tar Signed-off-by: zhehuaichen <[email protected]> * Update SpeechLLM code (#8475) * add pleasefixme marker for potential failed nightly tests. (#7678) Signed-off-by: Xuesong Yang <[email protected]> * Add new text segmentation library for better TTS quality (#7645) * Add new text segmentation library for better TTS quality * Update zh_cn_pinyin.py added detailed instruction on how to install pkuseg. Signed-off-by: Xuesong Yang <[email protected]> * Update requirements_tts.txt remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774) * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer * Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add '32-true' for precision values --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit c24bb454bf1fa6f5820f1805c6387254a73220b9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support …

github-actions bot added core Changes to NeMo Core NLP Multi Modal labels Mar 26, 2024

vasunvidia force-pushed the llm_cudagraph_02_28 branch from 4f66b27 to 210e4ff Compare March 26, 2024 21:12

github-actions bot removed the Multi Modal label Mar 26, 2024

vasunvidia force-pushed the llm_cudagraph_02_28 branch from ebc0d49 to 979d7fd Compare April 3, 2024 19:21

ericharper requested a review from timmoon10 April 5, 2024 23:41

github-actions bot removed the core Changes to NeMo Core label Apr 5, 2024

timmoon10 reviewed Apr 6, 2024

View reviewed changes

github-actions bot added the core Changes to NeMo Core label Apr 9, 2024

vasunvidia force-pushed the llm_cudagraph_02_28 branch 4 times, most recently from 6596ebf to 6834b18 Compare April 10, 2024 19:42

vasunvidia force-pushed the llm_cudagraph_02_28 branch from 093f2a0 to 3fb59ec Compare April 15, 2024 18:07

github-actions bot added Multi Modal and removed Multi Modal labels Apr 15, 2024

timmoon10 previously approved these changes Apr 15, 2024

View reviewed changes

timmoon10 mentioned this pull request Apr 15, 2024

Update transpose call to match TE API change #8918

Merged

8 tasks

vasunvidia force-pushed the llm_cudagraph_02_28 branch from 9882630 to 360bd3e Compare April 15, 2024 20:28

vasunvidia changed the base branch from main to r2.0.0.rc0.beta April 16, 2024 14:13

vasunvidia force-pushed the llm_cudagraph_02_28 branch 2 times, most recently from 1622e14 to 788adcc Compare April 17, 2024 03:33

ShriyaPalsamudram approved these changes Apr 17, 2024

View reviewed changes

ShriyaPalsamudram force-pushed the llm_cudagraph_02_28 branch from 788adcc to 6bc24ba Compare April 17, 2024 18:19

vasunvidia force-pushed the llm_cudagraph_02_28 branch 2 times, most recently from a46b3e5 to 8a11de1 Compare April 17, 2024 18:45

jbaczek and others added 17 commits April 17, 2024 11:50

Copy jbaczek/mcore_parallel_state_api_change branch leaving out chang…

47d1b28

…es to nemo/export/quantize/quantizer.py Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: Vasudevan Rengasamy <[email protected]>

Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving o…

94e3e3d

…ut changes to nemo/export/quantize/quantizer.py" This reverts commit b4f736e. Signed-off-by: Vasudevan Rengasamy <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2042270

for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]>

Remove skip_weight_update argument

e9214bb

Signed-off-by: Vasudevan Rengasamy <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2654a5f

for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <[email protected]>

Bug fix + cleanup

e4b2139

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Cleanup

c5c710b

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Use new TE API for FP8 Param transpose

480ed1f

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Change config param cuda_graph to enable_cuda_graph

c56dc47

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Enable TE RNGStatesTracker through config

12fb8f8

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Change te_rng_tracker to use_te_rng_tracker

82e7f79

Signed-off-by: Vasudevan Rengasamy <[email protected]>

FP8 weight transpose handled inside TE

b22949c

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Cleanup

a0d5a25

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch l…

c56358f

…eaving out changes to nemo/export/quantize/quantizer.py"" This reverts commit e318624. Signed-off-by: Vasudevan Rengasamy <[email protected]>

Fix merge conflicts

e3ad163

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Fix merge conflicts

5c32edd

Signed-off-by: Vasudevan Rengasamy <[email protected]>

Fix merge conflicts

920d637

Signed-off-by: Vasudevan Rengasamy <[email protected]>

vasunvidia force-pushed the llm_cudagraph_02_28 branch from 8a11de1 to 920d637 Compare April 17, 2024 18:52

ksivaman approved these changes Apr 17, 2024

View reviewed changes

pablo-garay merged commit e9bcaf3 into NVIDIA:r2.0.0.rc0.beta Apr 17, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to enable CUDA graph for LLM #8751

Changes to enable CUDA graph for LLM #8751

vasunvidia commented Mar 26, 2024 •

edited

Loading

ericharper commented Apr 5, 2024

timmoon10 Apr 5, 2024

timmoon10 Apr 9, 2024

ShriyaPalsamudram commented Apr 12, 2024

timmoon10 left a comment •

edited

Loading

Changes to enable CUDA graph for LLM #8751

Changes to enable CUDA graph for LLM #8751

Conversation

vasunvidia commented Mar 26, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

ericharper commented Apr 5, 2024

timmoon10 Apr 5, 2024

Choose a reason for hiding this comment

timmoon10 Apr 9, 2024

Choose a reason for hiding this comment

ShriyaPalsamudram commented Apr 12, 2024

timmoon10 left a comment • edited Loading

Choose a reason for hiding this comment

vasunvidia commented Mar 26, 2024 •

edited

Loading

timmoon10 left a comment •

edited

Loading