Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add change_vocabulary and save_tokenizers() support to Multitask ASR models #8357

Merged
merged 3 commits into from
Feb 7, 2024

Conversation

titu1994
Copy link
Collaborator

@titu1994 titu1994 commented Feb 7, 2024

What does this PR do ?

Adds support to change vocabulary of multi task models.
Adds support to save the tokenizer directory of all ASR models that implement ASRBPEMixin.

Collection: [ASR]

Changelog

  • Adds utility method save_tokenizers() that extracts all the internal tokenizers of a model into the specified directory.
  • Adds support method to change tokenizers of a multi task model.

Usage

from nemo.collections.asr.models import EncDecMultiTaskModel
asr_model = EncDecMultiTaskModel.from_pretrained("xyz")

# Save tokenizers - this creates a new directory at pwd called `all_tokenizers`
# Then it populates it with all the tokenizer objects stored in the model.
asr_model.save_tokenizers('./all_tokenizers/') 

# Change model vocabulary
tokenizer_cfg = OmegaConf.create(asr_model.cfg.tokenizer)
with open_dict(tokenizer_cfg):
    tokenizer_cfg.pop('langs', None) # remove old `langs` part of config from aggregate tokenizer

    # create new langs config
    tokenizer_cfg.langs = OmegaConf.create(dict(
        spl_tokens=dict(
            dir='all_tokenizers/spl_tokens/',
            type='bpe',
        ),
        en=dict(
            dir='all_tokenizers/en/',
            type='bpe',
        ),
        de=dict(
            dir='all_tokenizers/de/',
            type='bpe',
        ),
    ))

# Change the Multi task models tokenizers and vocabulary
asr_model.change_vocabulary(tokenizer_cfg, new_tokenizer_type='agg')

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

@github-actions github-actions bot added the ASR label Feb 7, 2024
@titu1994
Copy link
Collaborator Author

titu1994 commented Feb 7, 2024

jenkins

pzelasko
pzelasko previously approved these changes Feb 7, 2024
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nemo/collections/asr/models/aed_multitask_models.py Outdated Show resolved Hide resolved
loc = shutil.copy2(v, dir)
logging.info(f"Saved {k} at {loc}")

if isinstance(v, str) and v.startswith('nemo:'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why : after nemo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nemo files modify config of registered items like this to denote registered artifacts.

# Setup Decoder
transf_decoder_cfg_dict = self.transf_decoder.to_config_dict()

vocab_size = 8 * ceil(self.tokenizer.vocab_size / 8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why number 8 here and not another int?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the original code. @krishnacpuvvada

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
@titu1994 titu1994 merged commit 2f72846 into r1.23.0 Feb 7, 2024
12 checks passed
@titu1994 titu1994 deleted the canary_change_vocab branch February 7, 2024 20:34
github-actions bot pushed a commit that referenced this pull request Feb 7, 2024
…models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
ericharper pushed a commit that referenced this pull request Feb 14, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
biscayan pushed a commit to biscayan/NeMo that referenced this pull request Feb 15, 2024
…models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: biscayan <[email protected]>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
…models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
yaoyu-33 pushed a commit that referenced this pull request Feb 15, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
JRD971000 pushed a commit that referenced this pull request Feb 16, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
JRD971000 pushed a commit that referenced this pull request Feb 16, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
michal2409 pushed a commit that referenced this pull request Feb 23, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
michal2409 added a commit that referenced this pull request Feb 23, 2024
* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
michal2409 added a commit that referenced this pull request Feb 25, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
akoumpa added a commit that referenced this pull request Feb 26, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
akoumpa added a commit that referenced this pull request Feb 26, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
yaoyu-33 added a commit that referenced this pull request Feb 26, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
yaoyu-33 added a commit that referenced this pull request Feb 26, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
zpx01 pushed a commit to zpx01/NeMo that referenced this pull request Mar 8, 2024
…models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Zeeshan Patel <[email protected]>
zpx01 pushed a commit to zpx01/NeMo that referenced this pull request Mar 8, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (NVIDIA#8483)

* coldfix (NVIDIA#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (NVIDIA#8416) (NVIDIA#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (NVIDIA#8314)

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (NVIDIA#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279) (NVIDIA#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (NVIDIA#8302) (NVIDIA#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) (NVIDIA#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354) (NVIDIA#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400) (NVIDIA#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (NVIDIA#8427) (NVIDIA#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (NVIDIA#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421) (NVIDIA#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (NVIDIA#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749) (NVIDIA#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (NVIDIA#8315) (NVIDIA#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (NVIDIA#8283) (NVIDIA#8385)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

* add values to en tts dict (NVIDIA#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390) (NVIDIA#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (NVIDIA#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (NVIDIA#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (NVIDIA#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) (NVIDIA#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336) (NVIDIA#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (NVIDIA#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (NVIDIA#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Zeeshan Patel <[email protected]>
zpx01 pushed a commit to zpx01/NeMo that referenced this pull request Mar 8, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Zeeshan Patel <[email protected]>
JRD971000 added a commit that referenced this pull request Mar 15, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
JRD971000 added a commit that referenced this pull request Mar 15, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
…models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
pablo-garay added a commit that referenced this pull request Mar 19, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
pablo-garay added a commit that referenced this pull request Mar 19, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
ericharper added a commit that referenced this pull request Mar 19, 2024
* Refactor conversion scripts one in all

Signed-off-by: yaoyu-33 <[email protected]>

* Move bert converter

Signed-off-by: yaoyu-33 <[email protected]>

* [TTS] Add modules for mel spectrogram codec (#8238)

* [TTS] Add modules for mel spectrogram codec

Signed-off-by: Ryan <[email protected]>

* [TTS] Add mel band validation

Signed-off-by: Ryan <[email protected]>

* [TTS] Add fullband mel encoder and more documentation

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining



* Additional args



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last



* Some neva fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers



* [tutorial] fixed missing RIR scripts file. (#8257)



* fix imports



* imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook



* revert asr notebook



---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu



* ddpm config guard



* Fix ddpm edit api



* Fix insert_image_token cfg issue



* neva updates



* reformat



* Add back jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs



* Update default neva template



---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)



* add values to en tts dict (#7879)



* mcore ds fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore



* revert asr files



* add comments



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset



* update mcore version



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg



* update mcore commit



* fix Bert unit tests



* update bert tests



* fix bert mcore test



* fix gpt jenkins tests



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits



* revert apex installation



* turn off the fusion for jenkins



---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer



* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.



---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile



* Update Jenkinsfile



* Update Jenkinsfile



---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>

* Account for mpirun use case in get_rank (#8429)

Signed-off-by: Jan Lasek <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482)

* Add settings to suppress bf16 compile errors in CI on V100



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix canary chunk infer bug (#8449)

* fix chunk infer bug

Signed-off-by: stevehuang52 <[email protected]>

* add support for duration=None, add lhotse support for relative audio path

Signed-off-by: stevehuang52 <[email protected]>

* add tests

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>

* Add Baichuan2 support (#8282)

* Add Baichuan2 support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920)

* Initital commit of reworked MegatronPretrainingRandomBatchSampler

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed small length based bug

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Euynaheh <[email protected]>

* Add Baichuan2 support

Signed-off-by: Euynaheh <[email protected]>

* Add NeMo to HF conversion

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Baichuan jenkins test

* add_BOS bug fix

* Update Jenkinsfile

Signed-off-by: Euynaheh <[email protected]>

---------

Signed-off-by: Daniel Egert <[email protected]>
Signed-off-by: Euynaheh <[email protected]>
Signed-off-by: Euynaheh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: trias702 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana G…
huvunvidia added a commit that referenced this pull request Apr 16, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* fix whitespace

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
pablo-garay added a commit that referenced this pull request Apr 17, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* adding RETRO tests to cicd-main.yml action tests

* update ipa_cmudict-0.7b_nv23.01.txt

* remove quotes for model.data for legacy RETRO action tests

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
xingyaoww pushed a commit to xingyaoww/NeMo that referenced this pull request Apr 23, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* fix whitespace

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
xingyaoww pushed a commit to xingyaoww/NeMo that referenced this pull request Apr 23, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* adding RETRO tests to cicd-main.yml action tests

* update ipa_cmudict-0.7b_nv23.01.txt

* remove quotes for model.data for legacy RETRO action tests

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
huvunvidia added a commit that referenced this pull request Apr 23, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* runnable for inference

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* cleaning inference code

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* update Jenkins and _legacy.py

* update new RETRO jenkinstest to run faster

* fixing errors from GitHub Advanced Security / CodeQL

* fixing errors from GitHub Advanced Security / CodeQL

* update manually branch to huvu/mcore_retro

* remove DEBUGGING markers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy paste scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update codes to fix Github warnings; adding cicd-main.yml action tests

* cleaning code, addressing Shanmugam's comments

* saving before pulling from main

* cleaning code

* adding deprecations note

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
ericharper added a commit that referenced this pull request Apr 26, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* huvu/mcore_retro_docs first commit

* update with main

* update RETRO docs

* fix scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update docs

* update docs

* udpate RETRO docs

* update with Jennifer's comments

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* fix whitespace

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* adding RETRO tests to cicd-main.yml action tests

* update ipa_cmudict-0.7b_nv23.01.txt

* remove quotes for model.data for legacy RETRO action tests

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* runnable for inference

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* cleaning inference code

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* update Jenkins and _legacy.py

* update new RETRO jenkinstest to run faster

* fixing errors from GitHub Advanced Security / CodeQL

* fixing errors from GitHub Advanced Security / CodeQL

* update manually branch to huvu/mcore_retro

* remove DEBUGGING markers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy paste scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update codes to fix Github warnings; adding cicd-main.yml action tests

* cleaning code, addressing Shanmugam's comments

* saving before pulling from main

* cleaning code

* adding deprecations note

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* huvu/mcore_retro_docs first commit

* update with main

* update RETRO docs

* fix scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update docs

* update docs

* udpate RETRO docs

* update with Jennifer's comments

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
galv pushed a commit to galv/NeMo that referenced this pull request Apr 29, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* runnable for inference

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* cleaning inference code

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* update Jenkins and _legacy.py

* update new RETRO jenkinstest to run faster

* fixing errors from GitHub Advanced Security / CodeQL

* fixing errors from GitHub Advanced Security / CodeQL

* update manually branch to huvu/mcore_retro

* remove DEBUGGING markers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy paste scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update codes to fix Github warnings; adding cicd-main.yml action tests

* cleaning code, addressing Shanmugam's comments

* saving before pulling from main

* cleaning code

* adding deprecations note

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
Victor49152 added a commit that referenced this pull request May 1, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* Add back fp8 support

* SD-FP8: fix the bug of normalization location

Signed-off-by: Mingyuan Ma <[email protected]>

* map potential FP8 ckpt to FP16

Signed-off-by: Mingyuan Ma <[email protected]>

* Add TE fp8 training

Signed-off-by: Mingyuan Ma <[email protected]>

* Only overwrite unet precision when self.megatron_amp_O2 is true

Signed-off-by: Mingyuan Ma <[email protected]>

* New structure is now compatible with old ckpts

Signed-off-by: Mingyuan Ma <[email protected]>

* Add support on mapping old unet checkpoint to new structure and FP8 structure

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sync with main branch

Signed-off-by: Mingyuan Ma <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Mengdi Wang <[email protected]>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* fix whitespace

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* adding RETRO tests to cicd-main.yml action tests

* update ipa_cmudict-0.7b_nv23.01.txt

* remove quotes for model.data for legacy RETRO action tests

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* runnable for inference

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* cleaning inference code

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* update Jenkins and _legacy.py

* update new RETRO jenkinstest to run faster

* fixing errors from GitHub Advanced Security / CodeQL

* fixing errors from GitHub Advanced Security / CodeQL

* update manually branch to huvu/mcore_retro

* remove DEBUGGING markers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy paste scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update codes to fix Github warnings; adding cicd-main.yml action tests

* cleaning code, addressing Shanmugam's comments

* saving before pulling from main

* cleaning code

* adding deprecations note

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* huvu/mcore_retro_docs first commit

* update with main

* update RETRO docs

* fix scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update docs

* update docs

* udpate RETRO docs

* update with Jennifer's comments

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (#8242) (#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit d10726d)

Co-authored-by: Piotr Żelasko <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (#8371)

Signed-off-by: smajumdar <[email protected]>

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (#8283)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Add Finetuning tutorial with HF Datasets (#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (#8298)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (#8478) (#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* Add back fp8 support

* SD-FP8: fix the bug of normalization location

Signed-off-by: Mingyuan Ma <[email protected]>

* map potential FP8 ckpt to FP16

Signed-off-by: Mingyuan Ma <[email protected]>

* Add TE fp8 training

Signed-off-by: Mingyuan Ma <[email protected]>

* Only overwrite unet precision when self.megatron_amp_O2 is true

Signed-off-by: Mingyuan Ma <[email protected]>

* New structure is now compatible with old ckpts

Signed-off-by: Mingyuan Ma <[email protected]>

* Add support on mapping old unet checkpoint to new structure and FP8 structure

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sync with main branch

Signed-off-by: Mingyuan Ma <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Mengdi Wang <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (NVIDIA#8483)

* coldfix (NVIDIA#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (NVIDIA#8416) (NVIDIA#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357) (NVIDIA#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (NVIDIA#8314)

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (NVIDIA#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279) (NVIDIA#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (NVIDIA#8302) (NVIDIA#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334) (NVIDIA#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354) (NVIDIA#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400) (NVIDIA#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (NVIDIA#8427) (NVIDIA#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (NVIDIA#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421) (NVIDIA#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (NVIDIA#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749) (NVIDIA#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (NVIDIA#8315) (NVIDIA#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (NVIDIA#8283) (NVIDIA#8385)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

* add values to en tts dict (NVIDIA#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390) (NVIDIA#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (NVIDIA#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (NVIDIA#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (NVIDIA#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353) (NVIDIA#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336) (NVIDIA#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (NVIDIA#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (NVIDIA#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Naga Venkatesh Gavini <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: jbaczek <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Naga Venkatesh Gavini <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Refactor conversion scripts one in all

Signed-off-by: yaoyu-33 <[email protected]>

* Move bert converter

Signed-off-by: yaoyu-33 <[email protected]>

* [TTS] Add modules for mel spectrogram codec (#8238)

* [TTS] Add modules for mel spectrogram codec

Signed-off-by: Ryan <[email protected]>

* [TTS] Add mel band validation

Signed-off-by: Ryan <[email protected]>

* [TTS] Add fullband mel encoder and more documentation

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining



* Additional args



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last



* Some neva fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers



* [tutorial] fixed missing RIR scripts file. (#8257)



* fix imports



* imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook



* revert asr notebook



---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu



* ddpm config guard



* Fix ddpm edit api



* Fix insert_image_token cfg issue



* neva updates



* reformat



* Add back jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs



* Update default neva template



---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)



* add values to en tts dict (#7879)



* mcore ds fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore



* revert asr files



* add comments



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset



* update mcore version



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg



* update mcore commit



* fix Bert unit tests



* update bert tests



* fix bert mcore test



* fix gpt jenkins tests



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits



* revert apex installation



* turn off the fusion for jenkins



---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer



* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.



---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile



* Update Jenkinsfile



* Update Jenkinsfile



---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>

* Account for mpirun use case in get_rank (#8429)

Signed-off-by: Jan Lasek <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482)

* Add settings to suppress bf16 compile errors in CI on V100



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix canary chunk infer bug (#8449)

* fix chunk infer bug

Signed-off-by: stevehuang52 <[email protected]>

* add support for duration=None, add lhotse support for relative audio path

Signed-off-by: stevehuang52 <[email protected]>

* add tests

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>

* Add Baichuan2 support (#8282)

* Add Baichuan2 support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920)

* Initital commit of reworked MegatronPretrainingRandomBatchSampler

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed small length based bug

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Euynaheh <[email protected]>

* Add Baichuan2 support

Signed-off-by: Euynaheh <[email protected]>

* Add NeMo to HF conversion

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Baichuan jenkins test

* add_BOS bug fix

* Update Jenkinsfile

Signed-off-by: Euynaheh <[email protected]>

---------

Signed-off-by: Daniel Egert <[email protected]>
Signed-off-by: Euynaheh <[email protected]>
Signed-off-by: Euynaheh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: trias702 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* support/fix cpu initialization

Signed-off-by: Chen Cui <[email protected]>

* add example usage

Signed-off-by: Chen Cui <[email protected]>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <[email protected]>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <[email protected]>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <[email protected]>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <[email protected]>

* fix import

Signed-off-by: Jiaqi Zeng <[email protected]>

---------

Signed-off-by: jiaqi zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <[email protected]>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <[email protected]>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <[email protected]>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <[email protected]>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <[email protected]>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: jbaczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <[email protected]>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <[email protected]>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <[email protected]>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <[email protected]>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <[email protected]>

* Update test

Signed-off-by: Jan Baczek <[email protected]>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: jbaczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Remove unused imports

Signed-off-by: Abhishree <[email protected]>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <[email protected]>

* Add back resetting of num val samples

Signed-off-by: Abhishree <[email protected]>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <[email protected]>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <[email protected]>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <[email protected]>

* Add missing mock_dataset line

Signed-off-by: Abhishree <[email protected]>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <[email protected]>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <[email protected]>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <[email protected]>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <[email protected]>

* add lora SP CI test

Signed-off-by: Chen Cui <[email protected]>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <[email protected]>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <[email protected]>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <[email protected]>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <[email protected]>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <[email protected]>

* copyright header fix

Signed-off-by: Travis Bartley <[email protected]>

* yet another merge conflict

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <[email protected]>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <[email protected]>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <[email protected]>

* merge conflit

Signed-off-by: Travis Bartley <[email protected]>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <[email protected]>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <[email protected]>

* adding example scripts

Signed-off-by: Travis Bartley <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <[email protected]>

* adding unit tests

Signed-off-by: Travis Bartley <[email protected]>

* fixing unit tests

Signed-off-by: Travis Bartley <[email protected]>

* cfg name change

Signed-off-by: Travis Bartley <[email protected]>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <[email protected]>

* removing print script

Signed-off-by: Travis Bartley <[email protected]>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <[email protected]>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <[email protected]>

* docstring deleted by accident

Signed-off-by: Travis Bartley <[email protected]>

* plurals in namespace

Signed-off-by: Travis Bartley <[email protected]>

* changing example script

Signed-off-by: Travis Bartley <[email protected]>

---------

Signed-off-by: Travis Bartley <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

* revert accidental commit

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana G…
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit 86efc4e)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* fix whitespace

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit 86efc4e)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* update Jenkinstest for new RETRO to run faster

* fix isort

* adding RETRO tests to cicd-main.yml action tests

* update ipa_cmudict-0.7b_nv23.01.txt

* remove quotes for model.data for legacy RETRO action tests

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit 86efc4e)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* runnable for inference

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* cleaning inference code

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* update Jenkins and _legacy.py

* update new RETRO jenkinstest to run faster

* fixing errors from GitHub Advanced Security / CodeQL

* fixing errors from GitHub Advanced Security / CodeQL

* update manually branch to huvu/mcore_retro

* remove DEBUGGING markers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy paste scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update codes to fix Github warnings; adding cicd-main.yml action tests

* cleaning code, addressing Shanmugam's comments

* saving before pulling from main

* cleaning code

* adding deprecations note

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit 86efc4e)

Co-authored-by: Piotr Żelasko <[email protected]>

* add code for calling mcore_retro in NeMo

* add code for calling mcore_retro in NeMo

* runnable, training curve match retro mcore and nemo

* working on retro inference

* working on megatron_retro_eval.py and megatron_retro_inference.yaml

* refactoring text_generation_utils code and retro inference relevant files

* clean PR

* resolving quick hacks (reading number of train/valid samples from workdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)

* clean repository

* revert changes to inference/eval code to original in main

* clean code

* runable training code, with already implemented eval code

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* revert to original eval code files

* revert to original eval code files 2

* revert to original eval code files 3

* revert to original eval code files 4

* clean code

* clean code

* update my code to support changes from lastest main

* commit before rebase r1.23.0

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* copy paste files from r1.23.0

* clean PR

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* revert changes for tts and asr

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* implement retro's own fwd_bwd_step() and validation_step() to not have argument first_val_step, which the MLM commit doesn't support

* adding megatron compile_helpers(), in future can be fixed with correct MLM commit

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* addressing Eric's reviews

* adding existing implementation RETRO files

* adding existing implementation RETRO files

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* before update branch with latest r1.23.0

* update to run with MLM ae2817b3dde4efb1515061a5311d01d8f85bd99c (runnable training and saving checkpoint)

* remove compile_helpers

* reverse changes from main branch to r1.23.0

* adding *_legacy files

* update MLM commit in Jenkinsfile to latest

* debugging Jenkinstest: test different mcore import in retro_dataset

* update Jenkinsfile edit megatron_retro_mutransfer_pretrain_legacy.py

* removing all mcore RETRO to pass the Jenkinstest

* fixing import legacy problem for tests/collections/nlp/test_indexed_retrieval_dataset.py

* update Jenkinsfile file to use TE v0.7

* update NeMo to work with latest mcore RETRO (solving TE problems)

* update TE commit Jenkinsfile to be the same with r1.23.0's Jenkinsfile

* update commit for MLM

* jenkinstest debugging

* temporary fix RETRO's __init__ for jenkinstest

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* edit splits_string in jenkinsfile to correct format; put RETRO test in front to test faster

* add model.data.dataloader_type=cyclic to jenkinsfile

* update code to work with latest megatron-lm main 81dab6067

* update M-LM commit in Jenkinsfile to latest main M-LM 81dab6067

* fix to by pass CI test bf16 problem (following this PR https://github.com/NVIDIA/NeMo/pull/8481/files)

* isort and black

* adjusting model.micro_batch_size to 1

* fix BRANCH = 'r1.23.0'

* replace tutorials dir from main branch to huvu/mcore_retro

* fix minor merges conflict

* update Jenkinsfile

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* runnable with a temporary fix from Jacek (unfound -unfinished problem)

* modified nlp_overrides.py back to original

* fix checkpoint from Jacek Bieniusiewicz

* config Jenkinsfile test

* set RETRO Jenkins MBS to 1

* black fix

* isort fix

* update TE commit

* update to latest Jenkinsfile with latest container and commits

* remove new RETRO jenkinstest

* merge latest main

* put RETRO Jenkinstest to the right place

* update code for megatron_retro_pretraining_legacy.py

* untrack ipa_cmudict-0.7b_nv23.01.txt

* untrack ipa_cmudict-0.7b_nv23.01.txt

* set config in megatron_retro_pretraining_legacy.py to megatron_retro_config_legacy

* update new RETRO jenkinstest to run faster

* merging latest main, and edit Jenkinstest

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* huvu/mcore_retro_docs first commit

* update with main

* update RETRO docs

* fix scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt

* update docs

* update docs

* udpate RETRO docs

* update with Jennifer's comments

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: Huy Vu2 <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* update branch

Signed-off-by: eharper <[email protected]>

* Add dist ckpt support for regular optimizers (NVIDIA#7749)

* Add dist ckpt support for regular optimizers

Signed-off-by: Mikołaj Błaż <[email protected]>

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* fix imports

Signed-off-by: dimapihtar <[email protected]>

* imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

* revert asr notebook

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pin lhotse=1.19.2 in r1.23.0 (NVIDIA#8303)

Signed-off-by: Piotr Żelasko <[email protected]>

* Cache Aware Streaming tutorial notebook (NVIDIA#8296)

* add notebook

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename old notebook to Buffered_Streaming

Signed-off-by: Elena Rastorgueva <[email protected]>

* call setup_streaming_params in set_default_att_context_size method

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* update links to tutorials in docs

Signed-off-by: Elena Rastorgueva <[email protected]>

* remove hard-coding

Signed-off-by: Elena Rastorgueva <[email protected]>

* rename var

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* fix path location and branch (NVIDIA#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* add deallocate pipeline output optimization (NVIDIA#8279)

* add deallocate pipeline output optimization

Signed-off-by: Jimmy Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix memory leak caused by context parallelism hanging references by omegaconf (NVIDIA#8299)

* save cp_size to self

Signed-off-by: Jimmy Zhang <[email protected]>

* use parallel_state instead of self

Signed-off-by: Jimmy Zhang <[email protected]>

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* remove assertion (NVIDIA#8302)

Signed-off-by: dimapihtar <[email protected]>

* Update PEFT Doc (NVIDIA#8262)

* update peft doc

Signed-off-by: Chen Cui <[email protected]>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* fix table

Signed-off-by: Chen Cui <[email protected]>

* Merge branch 'r1.23.0' into chcui/update_peft_doc

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

* revert accidental changes

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Chen Cui <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks  (NVIDIA#8242) (NVIDIA#8324)

* Rebasing canary changes at current main

Signed-off-by: Piotr Żelasko <[email protected]>

* Move the changes from asr transformer to nlp transformer as originally intended

Signed-off-by: Piotr Żelasko <[email protected]>

* update eval to strip spaces before punctuations

Signed-off-by: stevehuang52 <[email protected]>

* update pc strip

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (NVIDIA#8247)

* Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.

Signed-off-by: Piotr Żelasko <[email protected]>

* [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (NVIDIA#8252)

* [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit

Signed-off-by: Piotr Żelasko <[email protected]>

* Move tokenization into `prompt_format_fn`, fix usage, add docs

Signed-off-by: Piotr Żelasko <[email protected]>

* Backward-compatible utterance validation

Signed-off-by: Piotr Żelasko <[email protected]>

* Improve type annotations

Signed-off-by: Piotr Żelasko <[email protected]>

* config and prompt_fn registration changes from review

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* fix transcribe config

Signed-off-by: stevehuang52 <[email protected]>

* Refactor Canary to follow schema of remaining ASR models (NVIDIA#8260)

* Initial draft of multi task beam decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Stabilize inference

Signed-off-by: smajumdar <[email protected]>

* Update AED Multi Task model to mostly conform to Archetype-Type format. Update config

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add change decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Remove redundant imports

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* Cleanup

Signed-off-by: smajumdar <[email protected]>

* remove asr transformer dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* copy token_classifier from nlp to asr

Signed-off-by: stevehuang52 <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Add typing to beam decoding

Signed-off-by: smajumdar <[email protected]>

* Make prompt format configurable

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* drop asr dependency on nlp

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stevehuang52 <[email protected]>

* fix transcribe, update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* Extend the docs for the canary prompt_fn

Signed-off-by: Piotr Żelasko <[email protected]>

* Incorporate changes from Nithin's code review

Signed-off-by: Piotr Żelasko <[email protected]>

* training bug fix and adding launch script for speech_multitask (NVIDIA#8270)

* bug fix and adding launch script for speech_multitask

Signed-off-by: Krishna Puvvada <[email protected]>

* update launch script example in speech_to_text_aed.py

Signed-off-by: Krishna Puvvada <[email protected]>

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Fix: drop_last must be true in validation/test otherwise the training will hang

Signed-off-by: Piotr Żelasko <[email protected]>

* revert to current transcribe API

Signed-off-by: stevehuang52 <[email protected]>

* revert changes to NLP, update docs

Signed-off-by: stevehuang52 <[email protected]>

* update eval utils

Signed-off-by: stevehuang52 <[email protected]>

* update docs

Signed-off-by: stevehuang52 <[email protected]>

* Remove DALI; rename compute_audio_loss to compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* set default use_model_transcribe=False

Signed-off-by: stevehuang52 <[email protected]>

* change os.path.dirname to pathlib

Signed-off-by: stevehuang52 <[email protected]>

* [canary] Test for CanaryTokenizer + refactoring (NVIDIA#8285)

* Test for CanaryTokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Attempt at refactor...

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>

* Update config for AED models (NVIDIA#8294)

Signed-off-by: smajumdar <[email protected]>

* set default calculate_wer=False in transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 1

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review, part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Document compute_loss

Signed-off-by: Piotr Żelasko <[email protected]>

* update transcribe_speech.py

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Attention encoder-decoder models for multiple speech-to-text tasks

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: stevehuang52 <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
(cherry picked from commit 86efc4e)

Co-authored-by: Piotr Żelasko <[email protected]>

* Multimodal r1.23.0 bug fix  (NVIDIA#8315)

* Rename quick-gelu

Signed-off-by: yaoyu-33 <[email protected]>

* ddpm config guard

Signed-off-by: yaoyu-33 <[email protected]>

* Fix ddpm edit api

Signed-off-by: yaoyu-33 <[email protected]>

* Fix insert_image_token cfg issue

Signed-off-by: yaoyu-33 <[email protected]>

* neva updates

Signed-off-by: yaoyu-33 <[email protected]>

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add back jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Update default neva template

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixes for MoE parameter passing & use of AutoTokenizer/Model for mistral. (NVIDIA#8272)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (NVIDIA#8334)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Remove asr webapp (NVIDIA#8347)

Signed-off-by: smajumdar <[email protected]>

* remove _target_ at model level in aed config (NVIDIA#8351)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (NVIDIA#8357)

* Add change_vocabulary and save_tokenizers() support

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>

* Change default (NVIDIA#8371)

Signed-off-by: smajumdar <[email protected]>

* bug fix in fast-conformer-aed.yaml and adding jenkins test for speech_to_text_aed model (NVIDIA#8368)

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Enable megatron core loggers for GPT pretraining (NVIDIA#8354)

* Logging changes tested for gpt_pretraining

Signed-off-by: Aishwarya Bhandare <[email protected]>

* Additional args

Signed-off-by: Aishwarya Bhandare <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* mcore ds fix (NVIDIA#8283)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

Signed-off-by: dimapihtar <[email protected]>

* revert apex installation

Signed-off-by: dimapihtar <[email protected]>

* turn off the fusion for jenkins

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* Add Finetuning tutorial with HF Datasets (NVIDIA#8356)

* Add Finetuning tutorial with HF Datasets

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* update on Som comments

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* release updates (NVIDIA#8378)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <[email protected]>

* revert asr files

Signed-off-by: dimapihtar <[email protected]>

* add comments

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <[email protected]>

* update mcore version

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <[email protected]>

* update mcore commit

Signed-off-by: dimapihtar <[email protected]>

* fix Bert unit tests

Signed-off-by: dimapihtar <[email protected]>

* update bert tests

Signed-off-by: dimapihtar <[email protected]>

* fix bert mcore test

Signed-off-by: dimapihtar <[email protected]>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* add mock ds test

Signed-off-by: dimapihtar <[email protected]>

* add test for dict data input type

Signed-off-by: dimapihtar <[email protected]>

* mcore ds fix

Signed-off-by: dimapihtar <[email protected]>

* data input fix

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>

* MCore dataset compatibility for tokenizers (NVIDIA#8390)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

Signed-off-by: Valerie Sarge <[email protected]>

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

Signed-off-by: Valerie Sarge <[email protected]>

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Mcore customization doc (NVIDIA#8298)

* [tutorial] fixed missing RIR scripts file. (NVIDIA#8257)

Signed-off-by: Xuesong Yang <[email protected]>

* add values to en tts dict (NVIDIA#7879)

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* Add Bert HF checkpoint converter (NVIDIA#8088)

* Add Bert HF checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reformat

Signed-off-by: yaoyu-33 <[email protected]>

* Add BERT ONNX export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add NeMo BERT to HF BERT script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean code

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update argument names

Signed-off-by: yaoyu-33 <[email protected]>

* Update build_transformer_config in Bert

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>

* initial placeholder

Signed-off-by: Huiying Li <[email protected]>

* add to intro/index.rst

Signed-off-by: Huiying Li <[email protected]>

* initial content update

Signed-off-by: Huiying Li <[email protected]>

* add diff images

Signed-off-by: Huiying Li <[email protected]>

size

Signed-off-by: Huiying Li <[email protected]>

* minor fixes

* minor language change

Signed-off-by: Chen Cui <[email protected]>

* clean changes

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* wer fix (NVIDIA#8404)

Signed-off-by: Travis Bartley <[email protected]>

* updated link to pubmed (NVIDIA#8402)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Update NFA video download link (NVIDIA#8406)

* update nfa nasa video link

Signed-off-by: Elena Rastorgueva <[email protected]>

* update link in markdown

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>

* revert changes (NVIDIA#8410)

Signed-off-by: Chen Cui <[email protected]>

* Fix dreambooth data sampler issue (NVIDIA#8400)

* Turn on drop last

Signed-off-by: yaoyu-33 <[email protected]>

* Some neva fixes

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed errors in the CTM gen functions (NVIDIA#8416)

Signed-off-by: Taejin Park <[email protected]>

* add ensemble decoding fix (NVIDIA#8427)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* SDE bugfix log (NVIDIA#8430)

Signed-off-by: George <[email protected]>

* mcore customization doc minor fix (NVIDIA#8421)

Signed-off-by: Huiying Li <[email protected]>

* NeMo-Mistral to HF converter bugfix. (NVIDIA#8353)

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Fixing mcore bert for TP, PP and SP (NVIDIA#8336)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

* Update Jenkinsfile

Signed-off-by: Shanmugam Ramasamy <[email protected]>

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add settings to suppress bf16 compile errors in CI on V100 (NVIDIA#8481)

* Add settings to suppress bf16 compile errors in CI on V100

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* MoE parameter passing (NVIDIA#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* CI fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update k2 version (NVIDIA#8478) (NVIDIA#8492)

Signed-off-by: Vladimir Bataev <[email protected]>

* Add fp8 support for SD/Update notebook paths (NVIDIA#8489)

* Add fp8 support for SD/Update notebook paths

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* pin to 0.5.0 (NVIDIA#8465)

Signed-off-by: eharper <[email protected]>

* Update NeMo Multimodal Requirements (NVIDIA#8515)

* Update requirements_multimodal.txt

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update github raw content link (NVIDIA#8517)

Signed-off-by: Chen Cui <[email protected]>

* Add dep notice for notebooks (NVIDIA#8522)

* add dep notice

Signed-off-by: eharper <[email protected]>

* revert

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Revert FP8 integration (NVIDIA#8520)

* Revert FP8 integration

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update data prep notebook (NVIDIA#8532)

Signed-off-by: Mingyuan Ma <[email protected]>

* Add back fp8 support

* SD-FP8: fix the bug of normalization location

Signed-off-by: Mingyuan Ma <[email protected]>

* map potential FP8 ckpt to FP16

Signed-off-by: Mingyuan Ma <[email protected]>

* Add TE fp8 training

Signed-off-by: Mingyuan Ma <[email protected]>

* Only overwrite unet precision when self.megatron_amp_O2 is true

Signed-off-by: Mingyuan Ma <[email protected]>

* New structure is now compatible with old ckpts

Signed-off-by: Mingyuan Ma <[email protected]>

* Add support on mapping old unet checkpoint to new structure and FP8 structure

Signed-off-by: Mingyuan Ma <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sync with main branch

Signed-off-by: Mingyuan Ma <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Krishna Puvvada <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Travis Bartley <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: George <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mingyuan Ma <[email protected]>
Co-authored-by: eharper <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Mengdi Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants