From 2b2e62db3172491840deffb631914f9f8de67f59 Mon Sep 17 00:00:00 2001 From: yaoyu-33 Date: Wed, 10 Jul 2024 10:20:26 -0700 Subject: [PATCH] update docs Signed-off-by: yaoyu-33 --- docs/source/multimodal/mllm/checkpoint.rst | 114 --------------------- docs/source/multimodal/mllm/intro.rst | 1 - docs/source/multimodal/vlm/checkpoint.rst | 56 +++------- 3 files changed, 17 insertions(+), 154 deletions(-) delete mode 100644 docs/source/multimodal/mllm/checkpoint.rst diff --git a/docs/source/multimodal/mllm/checkpoint.rst b/docs/source/multimodal/mllm/checkpoint.rst deleted file mode 100644 index d1fe7b651e66..000000000000 --- a/docs/source/multimodal/mllm/checkpoint.rst +++ /dev/null @@ -1,114 +0,0 @@ -Checkpoints -=========== - -In this section, we present four key functionalities of NVIDIA NeMo related to checkpoint management: - -1. **Checkpoint Loading**: Load local ``.nemo`` checkpoint files with the :code:`restore_from()` method. -2. **Partial Checkpoint Conversion**: Convert partially-trained ``.ckpt`` checkpoints to the ``.nemo`` format. -3. **Community Checkpoint Conversion**: Transition checkpoints from community sources, like HuggingFace, into the ``.nemo`` format. -4. **Model Parallelism Adjustment**: Modify model parallelism to efficiently train models that exceed the memory of a single GPU. NeMo employs both tensor (intra-layer) and pipeline (inter-layer) model parallelisms. Dive deeper with "Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM" (`link `_). This tool aids in adjusting model parallelism, accommodating users who need to deploy on larger GPU arrays due to memory constraints. - -Understanding Checkpoint Formats --------------------------------- - -A ``.nemo`` checkpoint is fundamentally a tar file that bundles the model configurations (given as a YAML file), model weights, and other pertinent artifacts like tokenizer models or vocabulary files. This consolidated design streamlines sharing, loading, tuning, evaluating, and inference. - -On the other hand, the ``.ckpt`` file is a product of PyTorch Lightning training. It stores model weights and optimizer states, and it's generally used for resuming training. - -Subsequent sections delve into each of the previously listed functionalities, emphasizing the loading of fully trained checkpoints for evaluation or additional fine-tuning. - - -Loading Local Checkpoints -------------------------- - -NeMo inherently saves any model's checkpoints in the ``.nemo`` format. To manually save a model at any stage: - -.. code-block:: python - - model.save_to(.nemo) - -To load a local ``.nemo`` checkpoint: - -.. code-block:: python - - import nemo.collections.multimodal as nemo_multimodal - model = nemo_multimodal.models..restore_from(restore_path="") - -Replace `` with the appropriate MM model class. - -Converting Local Checkpoints ----------------------------- - -The training script only auto-converts the final checkpoint into the ``.nemo`` format. To evaluate intermediate training checkpoints, conversion to ``.nemo`` might be needed. For this: - -.. code-block:: bash - - python -m torch.distributed.launch --nproc_per_node= * \ - examples/multimodal/convert_ckpt_to_nemo.py \ - --checkpoint_folder \ - --checkpoint_name \ - --nemo_file_path \ - --tensor_model_parallel_size \ - --pipeline_model_parallel_size - -Converting Community Checkpoints --------------------------------- - -NeVA Checkpoints -^^^^^^^^^^^^^^^^ - -Currently, the conversion mainly supports LLaVA checkpoints based on "llama-2 chat" checkpoints. As a reference, we'll consider the checkpoint `llava-llama-2-13b-chat-lightning-preview `_. - -After downloading this checkpoint and saving it at ``/path/to/llava-llama-2-13b-chat-lightning-preview``, undertake the following procedures: - -Modifying the Tokenizer -""""""""""""""""""""""" - -NeMo mandates adding specific tokens to the tokenizer model for peak performance. To modify an existing tokenizer located in ``/path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer``, execute the following in the NeMo container: - -.. code-block:: bash - - cd /opt/sentencepiece/src/ - protoc --python_out=/opt/NeMo/scripts/tokenizers/ sentencepiece_model.proto - python /opt/NeMo/scripts/tokenizers/add_special_tokens_to_sentencepiece.py \ - --input_file /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer.model \ - --output_file /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer_neva.model \ - --is_userdefined \ - --tokens "" "" "" "" \ - "" "" "" "" - -Checkpoint Conversion -""""""""""""""""""""" - -For conversion: - -.. code-block:: bash - - python examples/multimodal/mllm/neva/convert_hf_llava_to_neva.py \ - --in-file /path/to/llava-llama-2-13b-chat-lightning-preview \ - --out-file /path/to/neva-llava-llama-2-13b-chat-lightning-preview.nemo \ - --tokenizer-model /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer_add_special.model - --conv-template llama_2 - - -Model Parallelism Adjustment ----------------------------- - -NeVA Checkpoints -^^^^^^^^^^^^^^^^ - -Adjust model parallelism with: - -.. code-block:: bash - - python examples/nlp/language_modeling/megatron_change_num_partitions.py \ - --model_file=/path/to/source.nemo \ - --target_file=/path/to/target.nemo \ - --tensor_model_parallel_size=??? \ - --target_tensor_model_parallel_size=??? \ - --pipeline_model_parallel_size=??? \ - --target_pipeline_model_parallel_size=??? \ - --model_class="nemo.collections.multimodal.models.multimodal_llm.neva.neva_model.MegatronNevaModel" \ - --precision=32 \ - --tokenizer_model_path=/path/to/tokenizer.model \ - --tp_conversion_only diff --git a/docs/source/multimodal/mllm/intro.rst b/docs/source/multimodal/mllm/intro.rst index 0e76a9737a0f..48bfd56f9ae1 100644 --- a/docs/source/multimodal/mllm/intro.rst +++ b/docs/source/multimodal/mllm/intro.rst @@ -8,7 +8,6 @@ The endeavor to extend Language Models (LLMs) into multimodal domains by integra datasets configs - checkpoint neva video_neva sequence_packing diff --git a/docs/source/multimodal/vlm/checkpoint.rst b/docs/source/multimodal/vlm/checkpoint.rst index 996d9828f5aa..d984f1453510 100644 --- a/docs/source/multimodal/vlm/checkpoint.rst +++ b/docs/source/multimodal/vlm/checkpoint.rst @@ -35,58 +35,36 @@ To load a local ``.nemo`` checkpoint: Replace `` with the appropriate MM model class. -Converting Local Checkpoints ----------------------------- - -Only the last checkpoint is automatically saved in the ``.nemo`` format. If intermediate training checkpoints evaluation is required, a ``.nemo`` conversion might be necessary. For this, refer to the script at `script `_: - -.. code-block:: python - - python -m torch.distributed.launch --nproc_per_node= * \ - examples/multimodal/convert_ckpt_to_nemo.py \ - --checkpoint_folder \ - --checkpoint_name \ - --nemo_file_path \ - --tensor_model_parallel_size \ - --pipeline_model_parallel_size - Converting Community Checkpoints -------------------------------- CLIP Checkpoints ^^^^^^^^^^^^^^^^ -To migrate community checkpoints: -.. code-block:: python +To migrate community checkpoints, use the following command: + +.. code-block:: bash - python examples/multimodal/foundation/clip/convert_external_clip_to_nemo.py \ - --arch=ViT-H-14 \ - --version=laion2b_s32b_b79k \ - --hparams_file=path/to/saved.yaml \ - --nemo_file_path=open_clip.nemo + torchrun --nproc-per-node=1 /opt/NeMo/scripts/checkpoint_converters/convert_clip_hf_to_nemo.py \ + --input_name_or_path=openai/clip-vit-large-patch14 \ + --output_path=openai_clip.nemo \ + --hparams_file=/opt/NeMo/examples/multimodal/vision_language_foundation/clip/conf/megatron_clip_VIT-L-14.yaml Ensure the NeMo hparams file has the correct model architectural parameters, placed at `path/to/saved.yaml`. An example can be found in `examples/multimodal/foundation/clip/conf/megatron_clip_config.yaml`. -For OpenCLIP migrations, provide the architecture (`arch`) and version (`version`) according to the OpenCLIP `model list `_. For Hugging Face conversions, set the version to `huggingface` and the architecture (`arch`) to the specific Hugging Face model identifier, e.g., `yuvalkirstain/PickScore_v1`. +After conversion, you can verify the model with the following command: -Model Parallelism Adjustment ----------------------------- +.. code-block:: bash -CLIP Checkpoints -^^^^^^^^^^^^^^^^ + wget https://upload.wikimedia.org/wikipedia/commons/0/0f/1665_Girl_with_a_Pearl_Earring.jpg + torchrun --nproc-per-node=1 /opt/NeMo/examples/multimodal/vision_language_foundation/clip/megatron_clip_infer.py \ + model.restore_from_path=./openai_clip.nemo \ + image_path=./1665_Girl_with_a_Pearl_Earring.jpg \ + texts='["a dog", "a boy", "a girl"]' -To adjust model parallelism from original model parallelism size to a new model parallelism size (Note: NeMo CLIP currently only supports `pipeline_model_parallel_size=1`): +It should generate a high probability for the "a girl" tag. For example: -.. code-block:: python +.. code-block:: text - python examples/nlp/language_modeling/megatron_change_num_partitions.py \ - --model_file=/path/to/source.nemo \ - --target_file=/path/to/target.nemo \ - --tensor_model_parallel_size=??? \ - --target_tensor_model_parallel_size=??? \ - --pipeline_model_parallel_size=-1 \ - --target_pipeline_model_parallel_size=1 \ - --precision=32 \ - --model_class="nemo.collections.multimodal.models.clip.megatron_clip_models.MegatronCLIPModel" \ - --tp_conversion_only + Given image's CLIP text probability: [('a dog', 0.0049710185), ('a boy', 0.002258187), ('a girl', 0.99277073)]