Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PEFT Doc #8501

Merged
merged 7 commits into from
Feb 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,19 @@ such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups
Introduction
------------

NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers
NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers
working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR),
and text-to-speech synthesis (TTS).
The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia
The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia
to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.

For technical documentation, please see the `NeMo Framework User Guide <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_.

All NeMo models are trained with `Lightning <https://github.com/Lightning-AI/lightning>`_ and
training is automatically scalable to 1000s of GPUs.

When applicable, NeMo models take advantage of the latest possible distributed training techniques,
including parallelism strategies such as
When applicable, NeMo models take advantage of the latest possible distributed training techniques,
including parallelism strategies such as

* data parallelism
* tensor parallelism
Expand All @@ -84,7 +84,7 @@ and mixed precision training recipes with bfloat16 and FP8 training.
NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`_ for FP8 training on NVIDIA Hopper GPUs
and leverages `NVIDIA Megatron Core <https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core>`_ for scaling transformer model training.

NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF),
NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF),
see `NVIDIA NeMo Aligner <https://github.com/NVIDIA/NeMo-Aligner>`_ for more details.

NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) <https://developer.nvidia.com/nemo-microservices-early-access>`_.
Expand All @@ -93,7 +93,7 @@ NeMo ASR and TTS models can be optimized for inference and deployed for producti

For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_.
The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator <https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration>`_
which can be used to find the optimal model parallel configuration for training on a specific cluster.
which can be used to find the optimal model parallel configuration for training on a specific cluster.
To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_
The NeMo Framework Launcher does not currently support ASR and TTS training but will soon.

Expand Down
16 changes: 8 additions & 8 deletions docs/source/nlp/nemo_megatron/peft/landing_page.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ fraction of the computational and storage costs.
NeMo supports four PEFT methods which can be used with various
transformer-based models.

==================== ===== ===== ========= ==
\ GPT 3 NvGPT LLaMa 1/2 T5
==================== ===== ===== ========= ==
Adapters (Canonical) ✅ ✅ ✅ ✅
LoRA ✅ ✅
IA3
P-Tuning ✅ ✅
==================== ===== ===== ========= ==
==================== ===== ======== ========= ====== ==
\ GPT 3 Nemotron LLaMa 1/2 Falcon T5
==================== ===== ======== ========= ====== ==
LoRA ✅ ✅
P-Tuning
Adapters (Canonical) ✅ ✅ ✅
IA3 ✅ ✅
==================== ===== ======== ========= ====== ==

Learn more about PEFT in NeMo with the :ref:`peftquickstart` which provides an overview on how PEFT works
in NeMo. Read about the supported PEFT methods
Expand Down
6 changes: 4 additions & 2 deletions docs/source/nlp/nemo_megatron/peft/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Base model classes
PEFT in NeMo is built with a mix-in class that does not belong to any
model in particular. This means that the same interface is available to
different NeMo models. Currently, NeMo supports PEFT for GPT-style
models such as GPT 3, NvGPT, LLaMa 1/2 (``MegatronGPTSFTModel``), as
models such as GPT 3, Nemotron, LLaMa 1/2 (``MegatronGPTSFTModel``), as
well as T5 (``MegatronT5SFTModel``).

Full finetuning vs PEFT
Expand All @@ -78,11 +78,13 @@ PEFT.
trainer = MegatronTrainerBuilder(config).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)

### Training API ###
model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
+ peft_cfg = LoRAPEFTConfig(model_cfg)
+ peft_cfg = LoraPEFTConfig(model_cfg)
+ model.add_adapter(peft_cfg)
trainer.fit(model) # saves adapter weights only

### Inference API ###
# Restore from base then load adapter API
model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
+ model.load_adapters(adapter_save_path, peft_cfg)
Expand Down
Loading
Loading