Skip to content

Commit

Permalink
Add PP support in NeVA along with few bug fixes (#11170)
Browse files Browse the repository at this point in the history
* evian3 update

Signed-off-by: yaoyu-33 <[email protected]>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <[email protected]>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <[email protected]>

* clean up

Signed-off-by: yaoyu-33 <[email protected]>

* add aspect ratio in model

* support energon dataloader

* some pp update

Signed-off-by: yaoyu-33 <[email protected]>

* fixes

Signed-off-by: yaoyu-33 <[email protected]>

* fix kv merging

Signed-off-by: yaoyu-33 <[email protected]>

* fix get_key_value_tensors

Signed-off-by: yaoyu-33 <[email protected]>

* rename files

Signed-off-by: yaoyu-33 <[email protected]>

* update to HF style position embedding

Signed-off-by: yaoyu-33 <[email protected]>

* fix energon dataloader and support batching

* update forward args

Signed-off-by: yaoyu-33 <[email protected]>

* clean up and move to aspect_ratio_ids

Signed-off-by: yaoyu-33 <[email protected]>

* rename back to language.py

Signed-off-by: yaoyu-33 <[email protected]>

* fix loss function

Signed-off-by: yaoyu-33 <[email protected]>

* update and fix energon

Signed-off-by: yaoyu-33 <[email protected]>

* Add hf import

* Fix type

* Change config

* update energon pretrain

Signed-off-by: yaoyu-33 <[email protected]>

* clean up

* clean up

* reformat

Signed-off-by: yaoyu-33 <[email protected]>

* update inference files for new code

* update to instruct

* update to instruct

* update few names

Signed-off-by: yaoyu-33 <[email protected]>

* update generation

Signed-off-by: yaoyu-33 <[email protected]>

* fix importer embedding.weight

* few fixes

Signed-off-by: yaoyu-33 <[email protected]>

* add hf script

Signed-off-by: yaoyu-33 <[email protected]>

* fix kv import

* remove interleaved

* fixes and updates

Signed-off-by: yaoyu-33 <[email protected]>

* lora fixes

Signed-off-by: yaoyu-33 <[email protected]>

* some code clean ups

Signed-off-by: yaoyu-33 <[email protected]>

* update training scripts

Signed-off-by: yaoyu-33 <[email protected]>

* refactors

Signed-off-by: yaoyu-33 <[email protected]>

* add LoRA finetuning

* fixes and nemo update

Signed-off-by: yaoyu-33 <[email protected]>

* fix importer registering issue by adding 11B and 90B configs

* update `decoder_seq_len`

Signed-off-by: yaoyu-33 <[email protected]>

* science vqa script

Signed-off-by: yaoyu-33 <[email protected]>

* clean up script name

Signed-off-by: yaoyu-33 <[email protected]>

* fix ckpt save serialization issue

* fix predefined config classes

* add num_chunks in input

Signed-off-by: yaoyu-33 <[email protected]>

* fix format

Signed-off-by: yaoyu-33 <[email protected]>

* update finetuning scripts for PEFT

* add 11b recipe (need #10645 to test)

* fix mask generation

Signed-off-by: yaoyu-33 <[email protected]>

* minor fix code style

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Support no image inference

* add llama svqa eval

* fix masking

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix generation

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* add 90b recipe and revise 11b recipe

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* clean up typing

* add option to disable vision padding

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* base model finetuning (does not work yet)

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* fixed default conversation template config for MLLama

* Update svqa

* add multinode

* bot happy

* Apply isort and black reformatting

Signed-off-by: cuichenx <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Perf improvements. Mainly from XAttn mask calculation (#10901)

* Perf improvements. Mainly from XAttn mask calculation

* Apply isort and black reformatting

Signed-off-by: parthmannan <[email protected]>

---------

Signed-off-by: parthmannan <[email protected]>
Co-authored-by: parthmannan <[email protected]>

* fix existing issues

Signed-off-by: yaoyu-33 <[email protected]>

* fix scripts

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix lora

* few fixes for non image support

Signed-off-by: yaoyu-33 <[email protected]>

* update masking gen

Signed-off-by: yaoyu-33 <[email protected]>

* update lazy dataset

Signed-off-by: yaoyu-33 <[email protected]>

* fix data sampler and loading issue

Signed-off-by: yaoyu-33 <[email protected]>

* Add vlm generation

* Apply isort and black reformatting

Signed-off-by: meatybobby <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* generation update

Signed-off-by: yaoyu-33 <[email protected]>

* update lazy dataset

Signed-off-by: yaoyu-33 <[email protected]>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix warning

Signed-off-by: yaoyu-33 <[email protected]>

* hide vlm examples

Signed-off-by: yaoyu-33 <[email protected]>

* Revert "Add vlm generation"

This reverts commit 4711c75

Signed-off-by: yaoyu-33 <[email protected]>

* Fix VisionEncoder multi-batch bug

* update mcore parallelism initialization

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Update megatron_init.py

Signed-off-by: Yu Yao <[email protected]>

* add encoder parallel default config

Signed-off-by: yaoyu-33 <[email protected]>

* Fix _strategy_lib.py

Signed-off-by: Yu Yao <[email protected]>

* llm.generate fixes (#10983)

* fix context path, disable optimizer init, add tp

Signed-off-by: HuiyingLi <[email protected]>

* format

Signed-off-by: HuiyingLi <[email protected]>

* address comments, require user to provide trainer

Signed-off-by: HuiyingLi <[email protected]>

* minor fix

Signed-off-by: HuiyingLi <[email protected]>

* minor fixes

Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: HuiyingLi <[email protected]>

* use __dict__ in check (#11012)

* check is_hf_model in leaf module

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

* disable getattr alternative path

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* undo;

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* LoRA support for HF::AutoModelForCausalLM (#10982)

* add LinearAdapter

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* add hf lora example

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove unused imports

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* subclass mixin

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* remove stale imports

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* undo

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix scale

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* regex selector for peft

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* move lora

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fmt

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* hf_auto_model_for_causal_lm finetune recipe

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Change default for always_save_context to True (#11014)

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>

* Add a build option to load_context (#10713)

* Add a build option to load_context

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Adding test

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Trying to fix failing CPU test

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>

* cherry-pick fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>

* Fix pip install (#11026)

* Move AutoTokenizer inline

Signed-off-by: Marc Romeyn <[email protected]>

* Move einops to common requirements

Signed-off-by: Marc Romeyn <[email protected]>

* Move AutoTokenizer import to top-level again in fine_tuning

Signed-off-by: Marc Romeyn <[email protected]>

* Move megatron init inside nemo.lightning

Signed-off-by: Marc Romeyn <[email protected]>

* Make megatron_lazy_init_context work when transformer-engine is not installed

Signed-off-by: Marc Romeyn <[email protected]>

* Only import get_nmt_tokenizer when needed

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>

---------

Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Co-authored-by: marcromeyn <[email protected]>

* [WIP] Add docs for NEST SSL (#10804)

* add docs

Signed-off-by: stevehuang52 <[email protected]>

* update doc and fix missing param

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>

* Change dist ckpt defaults (#10913)

* Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min

Signed-off-by: Shriya Palsamudram <[email protected]>

* fix ssm tests

Signed-off-by: Shriya Palsamudram <[email protected]>

* Make note that ckpt_async_save is disabled for SSMs

Signed-off-by: Shriya Palsamudram <[email protected]>

* Enable async ckpt for SSMs with fix

Signed-off-by: Shriya Palsamudram <[email protected]>

* Disable async ckpt in the peft test as it is a known bug, add note.

Signed-off-by: Shriya Palsamudram <[email protected]>

* Fix failing unit tests

Signed-off-by: Shriya Palsamudram <[email protected]>

* Ashors/peft async ckpt (#11010)

* [WIP] prototype for supporting async checkpointing with peft

Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>

* Enable async ckpt for the peft test

Signed-off-by: Shriya Palsamudram <[email protected]>

* Fix peft setup test

Signed-off-by: Shriya Palsamudram <[email protected]>

---------

Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Co-authored-by: ataghibakhsh <[email protected]>

* Akoumparouli/mixtral recipe fix r2.0.0 (#10994)

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>

* Fix _strategy_lib tests (#11033)

* fix world size and don't mock

Signed-off-by: Maanu Grover <[email protected]>

* cleanup global state

Signed-off-by: Maanu Grover <[email protected]>

* check app state instead

Signed-off-by: Maanu Grover <[email protected]>

* fix syntax nemo logger test

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (#11016)

* Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (#10383)"

This reverts commit b5798de.

* make megatron sampler return the total number of batches in the dataset

Signed-off-by: ashors1 <[email protected]>

---------

Signed-off-by: ashors1 <[email protected]>

* PTQ example for NeMo 2.0 (#10642)

* initial commit

Signed-off-by: Piotr Kaminski <[email protected]>

* create Quantizer for NeMo 2.0

Signed-off-by: Piotr Kaminski <[email protected]>

* refactor

Signed-off-by: Piotr Kaminski <[email protected]>

* Call quantize on an unwrapped mcore model

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Add tests, adjust unwrapping

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* fix export

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* Fix output_path argument for HF import

Signed-off-by: Piotr Kamiński <[email protected]>

* fix fabric ckpt loading

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* code review suggestions

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* remove unused import

Signed-off-by: Piotr Kaminski <[email protected]>

* use cnn dataset in github ci

Signed-off-by: Piotr Kaminski <[email protected]>

* applied code review

Signed-off-by: Piotr Kaminski <[email protected]>

* code review changes

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* simplify interface for data iterator

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

* (partial) PP fix

Signed-off-by: Piotr Kaminski <[email protected]>

* Apply isort and black reformatting

Signed-off-by: Laplasjan107 <[email protected]>

---------

Signed-off-by: Piotr Kaminski <[email protected]>
Signed-off-by: Laplasjan107 <[email protected]>
Signed-off-by: Piotr Kamiński <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: Piotr Kaminski <[email protected]>
Co-authored-by: Laplasjan107 <[email protected]>
Co-authored-by: artbataev <[email protected]>

* TDT compute timestamps option and Extra Whitespace handling for SPE (#10875)

* add token duration

Signed-off-by: monica-sekoyan <[email protected]>

* revert rnnt change

Signed-off-by: monica-sekoyan <[email protected]>

* add remove_extra_whitespaces arg to spe tokenizer

Signed-off-by: monica-sekoyan <[email protected]>

* add token duration retrieval

Signed-off-by: monica-sekoyan <[email protected]>

* add ignore_extra_whitespace to spe

Signed-off-by: monica-sekoyan <[email protected]>

* add compute_timestamp support for tdt

Signed-off-by: monica-sekoyan <[email protected]>

* fix config field name

Signed-off-by: monica-sekoyan <[email protected]>

* add refinement for tdt timestamps

Signed-off-by: monica-sekoyan <[email protected]>

* add segments timestamp support and  refinement for ctc

Signed-off-by: monica-sekoyan <[email protected]>

* modify tests for ctc decoding timestamps

Signed-off-by: monica-sekoyan <[email protected]>

* add rnnt timestamp tests

Signed-off-by: monica-sekoyan <[email protected]>

* updated doc

Signed-off-by: monica-sekoyan <[email protected]>

* fix in test

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

* fix of unicode char

Signed-off-by: monica-sekoyan <[email protected]>

* fix rnnt_decoding test

Signed-off-by: monica-sekoyan <[email protected]>

* workaround for tesst tokenizer

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

* modify segments formation

Signed-off-by: monica-sekoyan <[email protected]>

* modify segments for ctc

Signed-off-by: monica-sekoyan <[email protected]>

* fix in ctc refinement

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

* minor changes

Signed-off-by: monica-sekoyan <[email protected]>

* reverse offset change

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

* warning mode=once

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

* make ignore_extrawhitespaces false

Signed-off-by: monica-sekoyan <[email protected]>

* minor changes

Signed-off-by: monica-sekoyan <[email protected]>

* adjust changes to the tests

Signed-off-by: monica-sekoyan <[email protected]>

* modify prompt_formatter tests

Signed-off-by: monica-sekoyan <[email protected]>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <[email protected]>

---------

Signed-off-by: monica-sekoyan <[email protected]>
Signed-off-by: monica-sekoyan <[email protected]>
Co-authored-by: monica-sekoyan <[email protected]>

* Basic online dynamic FP8 quantization with vLLM (#10904)

* Basic online dynamic quantization with vLLM

Signed-off-by: Jan Lasek <[email protected]>

* Apply isort and black reformatting

Signed-off-by: janekl <[email protected]>

* vllm 0.6.3 updates

Signed-off-by: Jan Lasek <[email protected]>

* Pass quantization param in deploy_vllm_triton.py script

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: janekl <[email protected]>
Co-authored-by: janekl <[email protected]>

* ci: Improve VM maintenance (#10758)

* ci: Improve VM maintenance

Signed-off-by: Oliver Koenig <[email protected]>

* rename stuff

Signed-off-by: Oliver Koenig <[email protected]>

* title

Signed-off-by: Oliver Koenig <[email protected]>

* use team

Signed-off-by: Oliver Koenig <[email protected]>

* run on failure too

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* yrdy

Signed-off-by: Oliver Koenig <[email protected]>

* f

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* f

Signed-off-by: Oliver Koenig <[email protected]>

* f

Signed-off-by: Oliver Koenig <[email protected]>

* f

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>

* neva update

Signed-off-by: yaoyu-33 <[email protected]>

* Add comment for vision transpose

* update megatron_init.py inside lightning

Signed-off-by: yaoyu-33 <[email protected]>

* Fix PP

Signed-off-by: yaoyu-33 <[email protected]>

* add examples

Signed-off-by: yaoyu-33 <[email protected]>

* fix test

Signed-off-by: yaoyu-33 <[email protected]>

* try fix test

Signed-off-by: yaoyu-33 <[email protected]>

* try fix test

Signed-off-by: yaoyu-33 <[email protected]>

* Fix megatron megatron_init.py dp

Signed-off-by: Yu Yao <[email protected]>

* Update lightning megatron_init.py dp

Signed-off-by: Yu Yao <[email protected]>

* make it possible to update pre_preprocess and post_process for llm, required in vlm

Signed-off-by: yaoyu-33 <[email protected]>

* Fixes for neva to run with PP

Signed-off-by: yaoyu-33 <[email protected]>

* Add mcore vit support, and checkpoint conversion

Signed-off-by: yaoyu-33 <[email protected]>

* fix checkpoint loading for epp

Signed-off-by: yaoyu-33 <[email protected]>

* update script

Signed-off-by: yaoyu-33 <[email protected]>

* rename llama to mllama folder name

Signed-off-by: yaoyu-33 <[email protected]>

* update to attention bias

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* added datamodule for llava-next

* modified state dict transform

* neva model changes to support  llava-next

* remove accidentally checked in files

Signed-off-by: Yashaswi Karnati <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* remove unused imports

* added io_init to not save task_encoder and image_processor

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* added scripts for pretrain and finetune

Signed-off-by: Yashaswi Karnati <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* generation example

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* small change in llava next example

* llava next end-end train

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* finetune changes

* Apply isort and black reformatting

Signed-off-by: yashaswikarnati <[email protected]>

* finetune debug changes

* update dropout to 0

Signed-off-by: yaoyu-33 <[email protected]>

* added example generation script

* added doc strings, formating, remove debug statemens and unsued imports

* remove example scripts

* fix attention bias

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* remove disable_vision_padding since we now have a fix

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Update init for mllama

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Address comments

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix copyright title

Signed-off-by: yaoyu-33 <[email protected]>

* multiple fixes

Signed-off-by: yaoyu-33 <[email protected]>

* bug fix

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix code scan

Signed-off-by: yaoyu-33 <[email protected]>

* Fix for SP

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update vision code

Signed-off-by: yaoyu-33 <[email protected]>

* revert attention bias changes until latest MLM code got merged

Signed-off-by: yaoyu-33 <[email protected]>

* fix warning

Signed-off-by: yaoyu-33 <[email protected]>

* Turn off system message check, as it's "" now

Signed-off-by: yaoyu-33 <[email protected]>

* Update layer spec and add siglip support

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update pretrain script

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* Fix scripts

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* add neva training recipes

Signed-off-by: yaoyu-33 <[email protected]>

* fix mllama mock ds

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix recipe

Signed-off-by: yaoyu-33 <[email protected]>

* fix pp

Signed-off-by: yaoyu-33 <[email protected]>

* scripts update

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* scripts update

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update config api

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* few updates

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* update 70b

Signed-off-by: yaoyu-33 <[email protected]>

* hide examples for pr

Signed-off-by: yaoyu-33 <[email protected]>

* fix few issues

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring layer spec

Signed-off-by: yaoyu-33 <[email protected]>

* add docstring to vit config

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix copyright

Signed-off-by: yaoyu-33 <[email protected]>

* fix

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: artbataev <[email protected]>
Signed-off-by: parthmannan <[email protected]>
Signed-off-by: meatybobby <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Marc Romeijn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>
Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Shriya Palsamudram <[email protected]>
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Maanu Grover <[email protected]>
Signed-off-by: Piotr Kaminski <[email protected]>
Signed-off-by: Laplasjan107 <[email protected]>
Signed-off-by: Piotr Kamiński <[email protected]>
Signed-off-by: monica-sekoyan <[email protected]>
Signed-off-by: monica-sekoyan <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: janekl <[email protected]>
Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Yashaswi Karnati <[email protected]>
Signed-off-by: yashaswikarnati <[email protected]>
Signed-off-by: Yashaswi Karnati <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: Bobby Chen <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: Yashaswi Karnati <[email protected]>
Co-authored-by: artbataev <[email protected]>
Co-authored-by: Parth Mannan <[email protected]>
Co-authored-by: parthmannan <[email protected]>
Co-authored-by: meatybobby <[email protected]>
Co-authored-by: Huiying <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Co-authored-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: marcromeyn <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Shriya Rishab <[email protected]>
Co-authored-by: ataghibakhsh <[email protected]>
Co-authored-by: Maanu Grover <[email protected]>
Co-authored-by: Anna Shors <[email protected]>
Co-authored-by: Piotr Kamiński <[email protected]>
Co-authored-by: Piotr Kaminski <[email protected]>
Co-authored-by: Laplasjan107 <[email protected]>
Co-authored-by: monica-sekoyan <[email protected]>
Co-authored-by: monica-sekoyan <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: janekl <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Co-authored-by: ykarnati <[email protected]>
Co-authored-by: Yashaswi Karnati <[email protected]>
Co-authored-by: yashaswikarnati <[email protected]>
  • Loading branch information
1 parent 34f7408 commit 773590c
Show file tree
Hide file tree
Showing 23 changed files with 1,128 additions and 457 deletions.
6 changes: 6 additions & 0 deletions nemo/collections/llm/fn/activation.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

import torch
from megatron.core.jit import jit_fuser


@torch.jit.script
Expand All @@ -25,6 +26,11 @@ def openai_gelu(x):
return gelu_impl(x)


@jit_fuser
def quick_gelu(x: torch.Tensor) -> torch.Tensor:
return x * torch.sigmoid(1.702 * x)


# @torch.jit.script # remove until we have serialization
def squared_relu(x):
"""Squared ReLU activation function."""
Expand Down
6 changes: 3 additions & 3 deletions nemo/collections/llm/gpt/model/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ class GPTConfig(TransformerConfig, io.IOMixin):
forward_step_fn: Callable = gpt_forward_step
data_step_fn: Callable = gpt_data_step

def configure_model(self, tokenizer) -> "MCoreGPTModel":
def configure_model(self, tokenizer, pre_process=None, post_process=None) -> "MCoreGPTModel":
vp_size = self.virtual_pipeline_model_parallel_size
if vp_size:
p_size = self.pipeline_model_parallel_size
Expand Down Expand Up @@ -214,8 +214,8 @@ def configure_model(self, tokenizer) -> "MCoreGPTModel":
rotary_percent=self.rotary_percent,
rotary_base=self.rotary_base,
seq_len_interpolation_factor=self.seq_len_interpolation_factor,
pre_process=parallel_state.is_pipeline_first_stage(),
post_process=parallel_state.is_pipeline_last_stage(),
pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
post_process=post_process or parallel_state.is_pipeline_last_stage(),
)

# If using full TE layer, need to set TP, CP group since the module call
Expand Down
4 changes: 2 additions & 2 deletions nemo/collections/llm/gpt/model/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,8 @@ class Llama31Config(Llama3Config):
old_context_len: int = 8192
init_method_std: float = 0.02

def configure_model(self, tokenizer) -> "MCoreGPTModel":
model = super().configure_model(tokenizer)
def configure_model(self, tokenizer, pre_process=None, post_process=None) -> "MCoreGPTModel":
model = super().configure_model(tokenizer, pre_process, post_process)
# Apply rope scaling for Llama3.1 model
model.rotary_pos_emb.inv_freq = apply_rope_scaling(
model.rotary_pos_emb.inv_freq,
Expand Down
6 changes: 3 additions & 3 deletions nemo/collections/llm/gpt/model/ssm.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ class SSMConfig(TransformerConfig, io.IOMixin):
data_step_fn: Callable = gpt_data_step
tokenizer_model_path: str = None

def configure_model(self, tokenizer) -> "MCoreMambaModel":
def configure_model(self, tokenizer, pre_process=None, post_process=None) -> "MCoreMambaModel":

return MCoreMambaModel(
self,
Expand All @@ -101,8 +101,8 @@ def configure_model(self, tokenizer) -> "MCoreMambaModel":
rotary_percent=self.rotary_percent,
rotary_base=self.rotary_base,
seq_len_interpolation_factor=self.seq_len_interpolation_factor,
pre_process=parallel_state.is_pipeline_first_stage(),
post_process=parallel_state.is_pipeline_last_stage(),
pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
post_process=post_process or parallel_state.is_pipeline_last_stage(),
)


Expand Down
11 changes: 8 additions & 3 deletions nemo/collections/vlm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
DataConfig,
ImageDataConfig,
ImageToken,
LlavaNextTaskEncoder,
MultiModalToken,
NevaLazyDataModule,
NevaMockDataModule,
Expand All @@ -42,7 +43,8 @@
NevaConfig,
NevaModel,
)
from nemo.collections.vlm.neva.model.llava import Llava1_5Config7B, Llava1_5Config13B, LlavaConfig, LlavaModel
from nemo.collections.vlm.neva.model.llava import Llava15Config7B, Llava15Config13B, LlavaConfig, LlavaModel
from nemo.collections.vlm.neva.model.vit_config import CLIPViTL_14_336_Config, SigLIPViT400M_14_384_Config
from nemo.collections.vlm.peft import LoRA
from nemo.collections.vlm.recipes import *

Expand All @@ -59,13 +61,16 @@
"VideoToken",
"CLIPViTConfig",
"HFCLIPVisionConfig",
"CLIPViTL_14_336_Config",
"SigLIPViT400M_14_384_Config",
"MultimodalProjectorConfig",
"NevaConfig",
"NevaModel",
"LlavaConfig",
"Llava1_5Config7B",
"Llava1_5Config13B",
"Llava15Config7B",
"Llava15Config13B",
"LlavaModel",
"LlavaNextTaskEncoder",
"MLlamaModel",
"MLlamaModelConfig",
"CrossAttentionTextConfig",
Expand Down
131 changes: 131 additions & 0 deletions nemo/collections/vlm/layer_specs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from megatron.core.fusions.fused_bias_dropout import get_bias_dropout_add
from megatron.core.tensor_parallel.layers import ColumnParallelLinear, RowParallelLinear
from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules
from megatron.core.transformer.dot_product_attention import DotProductAttention
from megatron.core.transformer.enums import AttnMaskType
from megatron.core.transformer.identity_op import IdentityOp
from megatron.core.transformer.mlp import MLP, MLPSubmodules
from megatron.core.transformer.spec_utils import ModuleSpec
from megatron.core.transformer.transformer_layer import TransformerLayer, TransformerLayerSubmodules

try:
from megatron.core.extensions.transformer_engine import (
TEColumnParallelLinear,
TEDotProductAttention,
TELayerNormColumnParallelLinear,
TENorm,
TERowParallelLinear,
)

HAVE_TE = True
except ImportError:
HAVE_TE = False

try:
from megatron.core.fusions.fused_layer_norm import FusedLayerNorm

HAVE_APEX = True
LNImpl = FusedLayerNorm
except ImportError:
import warnings

from megatron.core.transformer.torch_layer_norm import WrappedTorchLayerNorm

warnings.warn(f'Apex is not installed. Falling back to Torch LayerNorm')
LNImpl = WrappedTorchLayerNorm


def get_layer_spec(is_vit, normalization) -> ModuleSpec:
"""Transformer Layer Spec"""
attn_mask_type = AttnMaskType.no_mask if is_vit else AttnMaskType.causal
if normalization == "LayerNorm":
norm = LNImpl
elif normalization == "RMSNorm":
norm = TENorm
else:
raise RuntimeError("unknown normalization", normalization)

mlp = get_mlp_module_spec(use_te=False) # doesn't include norm.

return ModuleSpec(
module=TransformerLayer,
submodules=TransformerLayerSubmodules(
input_layernorm=norm,
self_attention=ModuleSpec(
module=SelfAttention,
params={"attn_mask_type": attn_mask_type},
submodules=SelfAttentionSubmodules(
linear_qkv=ColumnParallelLinear,
core_attention=DotProductAttention,
linear_proj=RowParallelLinear,
q_layernorm=IdentityOp,
k_layernorm=IdentityOp,
),
),
self_attn_bda=get_bias_dropout_add,
pre_mlp_layernorm=norm,
mlp=mlp,
mlp_bda=get_bias_dropout_add,
),
)


def get_layer_spec_te(is_vit=False) -> ModuleSpec:
"""Transformer Layer Spec w/ TE Modules"""
attn_mask_type = AttnMaskType.no_mask if is_vit else AttnMaskType.causal

mlp = get_norm_mlp_module_spec_te()
return ModuleSpec(
module=TransformerLayer,
submodules=TransformerLayerSubmodules(
self_attention=ModuleSpec(
module=SelfAttention,
params={"attn_mask_type": attn_mask_type},
submodules=SelfAttentionSubmodules(
linear_qkv=TELayerNormColumnParallelLinear,
core_attention=TEDotProductAttention,
linear_proj=TERowParallelLinear,
q_layernorm=IdentityOp,
k_layernorm=IdentityOp,
),
),
self_attn_bda=get_bias_dropout_add,
pre_mlp_layernorm=IdentityOp,
mlp=mlp,
mlp_bda=get_bias_dropout_add,
),
)


def get_mlp_module_spec(use_te: bool = True) -> ModuleSpec:
"""MLP Submodule Spec"""
# Dense MLP w/ or w/o TE modules.
return ModuleSpec(
module=MLP,
submodules=MLPSubmodules(
linear_fc1=TEColumnParallelLinear if use_te else ColumnParallelLinear,
linear_fc2=TERowParallelLinear if use_te else RowParallelLinear,
),
)


def get_norm_mlp_module_spec_te() -> ModuleSpec:
"""Norm + MLP Submodule Spec"""
return ModuleSpec(
module=MLP,
submodules=MLPSubmodules(linear_fc1=TELayerNormColumnParallelLinear, linear_fc2=TERowParallelLinear),
)
8 changes: 6 additions & 2 deletions nemo/collections/vlm/mllama/data/mock.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def __init__(
micro_batch_size: int = 4,
global_batch_size: int = 8,
rampup_batch_size: Optional[List[int]] = None,
tokenizer: Optional = None,
image_processor: Optional = None,
num_train_samples: int = 10_000,
num_val_samples: int = 10_000,
num_test_samples: int = 10_000,
Expand All @@ -52,6 +54,8 @@ def __init__(
self.persistent_workers = persistent_workers
self.vocab_size = vocab_size
self.crop_size = crop_size
self.tokenizer = tokenizer
self.image_processor = image_processor

self.data_sampler = MegatronDataSampler(
seq_len=self.seq_length,
Expand Down Expand Up @@ -142,8 +146,8 @@ def __getitem__(self, idx) -> Dict[str, torch.Tensor]:

return {
"images": images,
"masks": [[5, 512]],
"num_chunks": [4],
"masks": torch.tensor([[5, 512]]),
"num_chunks": torch.tensor([4]),
"tokens": tokens,
"aspect_ratio_ids": aspect_ratio_ids,
"loss_mask": self.loss_mask,
Expand Down
30 changes: 2 additions & 28 deletions nemo/collections/vlm/mllama/model/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from nemo.collections.vlm.mllama.model.language import CrossAttentionTextModel
from nemo.collections.vlm.mllama.model.utils import _generate_cross_attention_mask, _pad_attention_masks
from nemo.collections.vlm.mllama.model.vision import VisionEncoder
from nemo.collections.vlm.neva.model.base import MODEL_CONFIG_ATTR
from nemo.lightning import get_vocab_size, io
from nemo.lightning.megatron_parallel import MaskedTokenLossReduction
from nemo.lightning.pytorch.optim import MegatronOptimizerModule, OptimizerModule
Expand Down Expand Up @@ -240,35 +241,8 @@ class MLlamaModelConfig(TransformerConfig, io.IOMixin):
data_step_fn: Callable = llama_data_step

def __post_init__(self):
model_config_attr = [
'num_layers',
'hidden_size',
'num_attention_heads',
'num_query_groups',
'ffn_hidden_size',
'kv_channels',
'hidden_dropout',
'attention_dropout',
'fp32_residual_connection',
'apply_residual_connection_post_layernorm',
'layernorm_epsilon',
'layernorm_zero_centered_gamma',
'add_bias_linear',
'add_qkv_bias',
'gated_linear_unit',
'activation_func',
'activation_func_fp8_input_store',
'num_moe_experts',
'rotary_interleaved',
'window_size',
'normalization',
'qk_layernorm',
'test_mode',
'calculate_per_token_loss',
]

if self.language_model_config is not None:
for attr in model_config_attr:
for attr in MODEL_CONFIG_ATTR:
setattr(self, attr, getattr(self.language_model_config, attr))

def configure_model(self, tokenizer) -> "MLlamaBaseModel":
Expand Down
2 changes: 2 additions & 0 deletions nemo/collections/vlm/neva/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

from nemo.collections.vlm.neva.data.config import DataConfig, ImageDataConfig, VideoDataConfig
from nemo.collections.vlm.neva.data.lazy import NevaLazyDataModule
from nemo.collections.vlm.neva.data.llava_next_energon import LlavaNextTaskEncoder
from nemo.collections.vlm.neva.data.mock import MockDataModule as NevaMockDataModule
from nemo.collections.vlm.neva.data.multimodal_tokens import ImageToken, MultiModalToken, VideoToken

Expand All @@ -26,4 +27,5 @@
"MultiModalToken",
"ImageToken",
"VideoToken",
"LlavaNextTaskEncoder",
]
4 changes: 2 additions & 2 deletions nemo/collections/vlm/neva/data/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ def process_chat_template(self, tokenizer_name_or_path, messages):

def get_prompt(self):
messages = self.messages
messages = self.process_prompt_with_images(messages)

if self.sep_style == SeparatorStyle.SINGLE:
ret = self.system + self.sep
Expand All @@ -100,6 +99,8 @@ def get_prompt(self):
if type(message) is tuple:
message, _, _ = message
ret += role + ": " + message + seps[i % 2]
# Add space to make sure the labels can be correctly generated.
self.messages[i][1] = " " + self.messages[i][1]
else:
ret += role + ":"

Expand Down Expand Up @@ -155,7 +156,6 @@ def get_prompt(self):
ret = self.process_chat_template(tokenizer_name_or_path, messages)

elif self.sep_style == SeparatorStyle.MLLAMA:
""" """
tokenizer_name_or_path = self.tokenizer_name_or_path or "meta-llama/Llama-3.2-11B-Vision-Instruct"
ret = self.process_chat_template(tokenizer_name_or_path, messages)

Expand Down
8 changes: 6 additions & 2 deletions nemo/collections/vlm/neva/data/lazy.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def __init__(
data_config,
tokenizer,
image_processor,
sequence_length,
sequence_length=None,
):
super().__init__()
if data_path is not None:
Expand Down Expand Up @@ -497,6 +497,7 @@ def __init__(
weights: Optional[List[float]] = None,
data_config: Optional[DataConfig] = ImageDataConfig,
seq_length: int = 2048,
decoder_seq_length: Optional[int] = None,
tokenizer: Optional = None,
image_processor: Optional = None,
micro_batch_size: int = 4,
Expand All @@ -523,6 +524,7 @@ def __init__(
self.weights = weights
self.data_config = data_config
self.seq_length = seq_length
self.decoder_seq_length = decoder_seq_length
self.tokenizer = tokenizer
self.image_processor = image_processor
self.num_train_samples = num_train_samples
Expand All @@ -538,13 +540,15 @@ def __init__(
if tokenizer is None or image_processor is None:
logging.warning(f"Processor and tokenizer are not provided! Fall back to `llava-hf/llava-1.5-7b-hf`.")
from transformers import AutoProcessor
from nemo.collections.common.tokenizers.huggingface.auto_tokenizer import AutoTokenizer

processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
self.tokenizer = tokenizer or processor.tokenizer
self.tokenizer = tokenizer or AutoTokenizer("llava-hf/llava-1.5-7b-hf")
self.image_processor = image_processor or processor.image_processor

self.data_sampler = MegatronDataSampler(
seq_len=self.seq_length,
decoder_seq_len=self.decoder_seq_length,
micro_batch_size=micro_batch_size,
global_batch_size=global_batch_size,
dataloader_type="cyclic",
Expand Down
Loading

0 comments on commit 773590c

Please sign in to comment.