New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Yash/dev llava next #10749

Closed

yashaswikarnati wants to merge 99 commits into main from yash/dev_llava_next

Collaborator

yashaswikarnati commented Oct 3, 2024

What does this PR do ?

Support training of LLaVa NeXt model.

Collection: [Note which collection this PR will affect]
Multimodal

Changelog

Added necessary task encoders for energon data module to support training LLaVA NeXT with NeVA Model

Usage

The only change is to use the energon based data module as shown below with existing NeVA model

from nemo.collections.multimodal.data.energon.config import MultiModalSampleConfig
from nemo.collections.vlm import LlavaNextTaskEncoder
from nemo.collections.multimodal.data.energon import SimpleMultiModalDataModule
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
    data_path = args.data_path
    image_processor = processor.image_processor
    tokenizer = processor.tokenizer

    multimodal_sample_config = MultiModalSampleConfig()

    task_encoder = LlavaNextTaskEncoder(
        tokenizer=tokenizer, image_processor=image_processor, multimodal_sample_config=multimodal_sample_config
    )
    data = SimpleMultiModalDataModule(
        path=data_path,
        tokenizer=tokenizer,
        image_processor=image_processor,
        num_workers=8,
        micro_batch_size=mbs,
        global_batch_size=gbs,
        multimodal_sample_config=multimodal_sample_config,
        task_encoder=task_encoder,
    )

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

yashaswikarnati requested a review from yaoyu-33

October 3, 2024 20:33

Collaborator Author

yashaswikarnati commented Oct 3, 2024

https://wandb.ai/joc/llava_next_energon/runs/70551i0j?nw=nwuserykarnati - fine tuning convergence run on LLaVA dataset

github-advanced-security bot found potential problems

View reviewed changes

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

nemo/collections/vlm/neva/data/llava_next_energon.py Fixed Show fixed Hide fixed

ashors1 and others added 11 commits

October 15, 2024 12:57


          locate weights path within MegatronCheckpointIO

dcb38f3

Signed-off-by: ashors1 <[email protected]>


          small refactor

Signed-off-by: ashors1 <[email protected]>


          remove another instance of ckpt_to_weights_subdir

Signed-off-by: ashors1 <[email protected]>


          move ckpt_to_weights_subdir

eed4bad

Signed-off-by: ashors1 <[email protected]>


          Apply isort and black reformatting

52c0ad3

Signed-off-by: ashors1 <[email protected]>


          Apply isort and black reformatting

e5dbd61

Signed-off-by: artbataev <[email protected]>


          add weights path in save_checkpoint

45df47d

Signed-off-by: ashors1 <[email protected]>


          fix circular import

c49e2a6

Signed-off-by: ashors1 <[email protected]>


          Apply isort and black reformatting

d3ffd5d

Signed-off-by: ashors1 <[email protected]>


          handle saving in ckpt_to_weights_subdir

ea49e20

Signed-off-by: ashors1 <[email protected]>


          fix minor typo

c4c3fd5

Signed-off-by: ashors1 <[email protected]>

github-actions bot added the Multi Modal label

github-advanced-security bot found potential problems

View reviewed changes

examples/vlm/llava_next_energon_training.py

		@@ -0,0 +1,161 @@
		import argparse
		import os

Check notice

Code scanning / CodeQL

Unused import

Import of 'os' is not used.

examples/vlm/llava_next_energon_training.py

@@ @@ -0,0 +1,161 @@ @@
+              import argparse
+              import os
+              import sys

Check notice

Code scanning / CodeQL

Unused import

Import of 'sys' is not used.

examples/vlm/llava_next_energon_training.py

+              import os
+              import sys
+              import requests

Check notice

Code scanning / CodeQL

Unused import

Import of 'requests' is not used.

examples/vlm/llava_next_energon_training.py

+              import requests
+              import torch
+              from megatron.core.optimizer import OptimizerConfig
+              from megatron.energon import VQASample

Check notice

Code scanning / CodeQL

Unused import

Import of 'VQASample' is not used.

examples/vlm/llava_next_energon_training.py

+              import torch
+              from megatron.core.optimizer import OptimizerConfig
+              from megatron.energon import VQASample
+              from PIL import Image

Check notice

Code scanning / CodeQL

Unused import

Import of 'Image' is not used.

examples/vlm/llava_next_energon_training.py

+              from nemo.collections import llm, vlm
+              from nemo.collections.multimodal.data.energon import SimpleMultiModalDataModule
+              from nemo.collections.multimodal.data.energon.config import MultiModalSampleConfig
+              from nemo.collections.vlm import ImageDataConfig, Llava1_5Config7B, LlavaModel, LlavaNextTaskEncoder

Check notice

Code scanning / CodeQL

Unused import

Import of 'ImageDataConfig' is not used. Import of 'Llava1_5Config7B' is not used. Import of 'LlavaModel' is not used.

examples/vlm/llava_next_energon_training.py

+              from nemo.collections.vlm import ImageDataConfig, Llava1_5Config7B, LlavaModel, LlavaNextTaskEncoder
+              from nemo.lightning.pytorch.optim import CosineAnnealingScheduler
+              from nemo.lightning.pytorch.optim.megatron import MegatronOptimizerModule
+              from nemo.utils import logging

Check notice

Code scanning / CodeQL

Unused import

Import of 'logging' is not used.

examples/vlm/llava_next_energon_training.py

+                  # Global and micro batch sizes
+                  gbs = 32
+                  mbs = 4
+                  seq_length = 256

Check notice

Code scanning / CodeQL

Unused local variable

Variable seq_length is not used.

examples/vlm/llava_next_finetune_energon.py

+              from nemo.collections import llm, vlm
+              from nemo.collections.multimodal.data.energon import SimpleMultiModalDataModule
+              from nemo.collections.multimodal.data.energon.config import MultiModalSampleConfig
+              from nemo.collections.vlm import ImageDataConfig, LlavaNextTaskEncoder

Check notice

Code scanning / CodeQL

Unused import

Import of 'ImageDataConfig' is not used.

examples/vlm/llava_next_finetune_energon.py

+                  # Global and micro batch sizes
+                  gbs = 128
+                  mbs = 4
+                  seq_length = 4096

Check notice

Code scanning / CodeQL

Unused local variable

Variable seq_length is not used.

ashors1 and others added 13 commits

October 16, 2024 14:58


          bug fixes

3ae933e

Signed-off-by: ashors1 <[email protected]>


          fix undefined variable

f1fbec5

Signed-off-by: ashors1 <[email protected]>


          move function

Signed-off-by: ashors1 <[email protected]>


          Apply isort and black reformatting

994719e

Signed-off-by: ashors1 <[email protected]>


          fix adapter meta file path

ea51ab2

Signed-off-by: Chen Cui <[email protected]>


          Apply isort and black reformatting

871ac85

Signed-off-by: cuichenx <[email protected]>


          Merge branch 'refs/heads/main' into ashors/ckpt-subdirs

f5889ca


          Merge remote-tracking branch 'origin/ashors/ckpt-subdirs' into ashors…

df2c4b1

…/ckpt-subdirs


          fix mixtral test

5aec05b

Signed-off-by: ashors1 <[email protected]>


          fix mixtral test

2df54e3

Signed-off-by: ashors1 <[email protected]>


          use function for weights subdir

440a244

Signed-off-by: Chen Cui <[email protected]>


          address comments

b2883a1

Signed-off-by: ashors1 <[email protected]>


          move asserts

26a8d8d

Signed-off-by: ashors1 <[email protected]>

yashaswikarnati requested a review from pablo-garay as a code owner

October 20, 2024 22:02

akoumpa and others added 16 commits

October 24, 2024 14:42


          Akoumparouli/mixtral recipe fix r2.0.0 (#10994)

cde2e02

* Mixtral TP8 EP1

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>


          added datamodule for llava-next

e2db0be


          modified state dict transform

5eb00b0


          neva model changes to support llava-next

d263a60


          remove accidentally checked in files

97025ee

Signed-off-by: Yashaswi Karnati <[email protected]>


          Apply isort and black reformatting

37c6c55

Signed-off-by: yashaswikarnati <[email protected]>


          remove unused imports

bac0f64


          added io_init to not save task_encoder and image_processor

da05cf1


          Apply isort and black reformatting

cfb521c

Signed-off-by: yashaswikarnati <[email protected]>


          added scripts for pretrain and finetune

d3a718f

Signed-off-by: Yashaswi Karnati <[email protected]>


          Apply isort and black reformatting

438c573

Signed-off-by: yashaswikarnati <[email protected]>


          [🤠]: Howdy folks, let's bump Dockerfile.ci to 73e7b58 ! (#10779)

29a2ed8

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>


          generation example

c93cda7


          Apply isort and black reformatting

b2689fd

Signed-off-by: yashaswikarnati <[email protected]>


          small change in llava next example

302afb7


          edited merge conflict

accc256

yashaswikarnati force-pushed the yash/dev_llava_next branch from 4ded144 to accc256 Compare

October 24, 2024 22:54

github-actions bot added core TTS ASR NLP CI common audio labels

Yashaswi Karnati and others added 5 commits

October 28, 2024 16:38


          llava next end-end train

590e7cd


          Apply isort and black reformatting

397aa80

Signed-off-by: yashaswikarnati <[email protected]>


          finetune changes

f6e9255


          Apply isort and black reformatting

e9d4b98

Signed-off-by: yashaswikarnati <[email protected]>


          finetune debug changes

9a58842

yashaswikarnati closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR audio CI common core Multi Modal NLP TTS

27 participants