Skip to content

Commit

Permalink
add alpha scaling to lora (#8248)
Browse files Browse the repository at this point in the history
* removed pdeprecated eft model

Signed-off-by: arendu <[email protected]>

* add alpha

Signed-off-by: arendu <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <[email protected]>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <[email protected]>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>

* add inference via script

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* requested changes

Signed-off-by: Pratyush Muthukumar <[email protected]>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <[email protected]>

---------

Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <[email protected]>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <[email protected]>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <[email protected]>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix test

Signed-off-by: Vladimir Bataev <[email protected]>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <[email protected]>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <[email protected]>

* Clean up

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <[email protected]>

* add doc

Signed-off-by: ataghibakhsh <[email protected]>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <[email protected]>

* addressed bot comments

Signed-off-by: ataghibakhsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <[email protected]>

* Update readme (#8440)

* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <[email protected]>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <[email protected]>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* try to fix ci

Signed-off-by: Chen Cui <[email protected]>

* add cfg default

Signed-off-by: Chen Cui <[email protected]>

---------

Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <[email protected]>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <[email protected]>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <[email protected]>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <[email protected]>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <[email protected]>

* fix code format

Signed-off-by: Huiying Li <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

* default for alpha

Signed-off-by: arendu <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <[email protected]>

---------

Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: arendu <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: George Zelenfroynd <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Pratyush Muthukumar <[email protected]>
Signed-off-by: Huiying Li <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: Valerie Sarge <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: Shanmugam Ramasamy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Jiaqi Zeng <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: George <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Piotr Żelasko <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: JimmyZhang12 <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ashbhandare <[email protected]>
Co-authored-by: Aishwarya Bhandare <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Pratyush Muthukumar <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Mariana <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Valerie Sarge <[email protected]>
Co-authored-by: Ali Taghibakhshi <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Tugrul Konuk <[email protected]>
Co-authored-by: Jiaqi Zeng <[email protected]>
Co-authored-by: HeyyyyyyG <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: ataghibakhsh <[email protected]>
  • Loading branch information
1 parent 56b8bed commit fe65607
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ model:
lora_tuning:
target_modules: ['attention_qkv'] # this can either be 'attention_qkv','attention_dense','mlp_fc1','mlp_fc2', attention (qkv & dense), mlp (fc1 & fc2)
adapter_dim: 32
alpha: ${model.peft.lora_tuning.adapter_dim}
adapter_dropout: 0.0
column_init_method: 'xavier' # IGNORED if linear_adapter is used, options: xavier, zero or normal
row_init_method: 'zero' # IGNORED if linear_adapter is used, options: xavier, zero or normal
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ def __init__(
input_is_parallel: bool = False, # NOTE: (@ertkonuk) we need this for LoRA adapters that are applied to RowParallelLinear layers
dropout: float = 0.0,
model_parallel_config: Optional[ModelParallelConfig] = None,
alpha: float | None = None,
**kwargs,
):
super().__init__()
Expand All @@ -151,7 +152,9 @@ def __init__(
self.activation = activation_registry[activation]()
self.norm_position = norm_position
self.dim = dim
self.alpha = alpha if alpha is not None else self.dim
self.input_is_parallel = input_is_parallel

# megatron_gpt_peft_models will provide this arg, but deprecated ones do not.
# in case this arg is not provided, use the dummy default config.
if model_parallel_config is None:
Expand Down Expand Up @@ -274,6 +277,8 @@ def forward(self, x):
if self.dropout is not None:
x = self.dropout(x)

x = x * (self.alpha / self.dim)

return x


Expand All @@ -290,6 +295,7 @@ class ParallelLinearAdapterConfig(AdapterConfig):
gather_output: bool = True
input_is_parallel: bool = False
dropout: float = 0.0
alpha: float | None = None
network_alpha: int | None = None
_target_: str = "{0}.{1}".format(ParallelLinearAdapter.__module__, ParallelLinearAdapter.__name__)

Expand Down
1 change: 1 addition & 0 deletions nemo/collections/nlp/parts/peft_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ def _create_lora_config(self, cfg, lora_cfg, in_features, out_features, adapter_
"row_init_method": lora_cfg.get("row_init_method", "zero"),
"gather_output": False,
"dropout": lora_cfg.adapter_dropout,
"alpha": lora_cfg.get("alpha", lora_cfg.adapter_dim),
}

if lora_cfg.weight_tying:
Expand Down

0 comments on commit fe65607

Please sign in to comment.