Skip to content

Commit

Permalink
Merge branch 'r1.23.0' into jenkins_for_mistral_mixtral/akoumparouli
Browse files Browse the repository at this point in the history
  • Loading branch information
akoumpa authored Feb 5, 2024
2 parents 568ca20 + a592517 commit 9df57ec
Show file tree
Hide file tree
Showing 48 changed files with 3,140 additions and 1,340 deletions.
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Key Features
* Hybrid Transducer/CTC
* NeMo Original `Multi-blank Transducers <https://arxiv.org/abs/2211.03541>`_ and `Token-and-Duration Transducers (TDT) <https://arxiv.org/abs/2304.06795>`_
* Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_chunked_inference>`_
* `Cache-aware Streaming Conformer <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#cache-aware-streaming-conformer>`_ with multiple lookaheads.
* `Cache-aware Streaming Conformer <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#cache-aware-streaming-conformer>`_ with multiple lookaheads (including microphone streaming `tutorial <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb>`_).
* Beam Search decoding
* `Language Modelling for ASR (CTC and RNNT) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html>`_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer
* `Support of long audios for Conformer with memory efficient local attention <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html#inference-on-long-audio>`_
Expand All @@ -125,7 +125,7 @@ Key Features
* `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
* `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
* `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/dialogue.html>`_
* `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/prompt_learning.html>`_
* `Parameter Efficient Finetuning (PEFT) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/peft/landing_page.html>`_
* `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
* `Synthetic Tabular Data Generation <https://developer.nvidia.com/blog/generating-synthetic-data-with-transformers-a-solution-for-enterprise-data-challenges/>`_
* Text-to-Speech Synthesis (TTS):
Expand Down
4 changes: 1 addition & 3 deletions docs/source/asr/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,7 @@ See more information about LM decoding :doc:`here <./asr_language_modeling>`.
Use real-time transcription
---------------------------

It is possible to use NeMo to transcribe speech in real-time. You can find an example of how to do
this in the following `notebook tutorial <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo.ipynb>`_.

It is possible to use NeMo to transcribe speech in real-time. We provide tutorial notebooks for `Cache Aware Streaming <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb>`_ and `Buffered Streaming <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Buffered_Streaming.ipynb>`_.

Try different ASR models
------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/asr/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,8 @@ You may find more examples under ``<NeMo_git_root>/examples/asr/conf/fastconform
Cache-aware Streaming Conformer
-------------------------------

Try real-time ASR with the Cache-aware Streaming Conformer `tutorial notebook <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb>`_.

Buffered streaming uses overlapping chunks to make an offline ASR model to be used for streaming with reasonable accuracy. However, it uses significant amount of duplication in computations due to the overlapping chunks.
Also there is a accuracy gap between the offline model and the streaming one as there is inconsistency between how we train the model and how we perform inference for streaming.
The Cache-aware Streaming Conformer models would tackle and address these disadvantages. These streaming Conformers are trained with limited right context that it would make it possible to match how the model is being used in both the training and inference.
Expand Down
16 changes: 8 additions & 8 deletions docs/source/nlp/nemo_megatron/peft/landing_page.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ fraction of the computational and storage costs.
NeMo supports four PEFT methods which can be used with various
transformer-based models.

==================== ===== ===== ========= ==
\ GPT 3 NvGPT LLaMa 1/2 T5
==================== ===== ===== ========= ==
Adapters (Canonical) ✅ ✅ ✅ ✅
LoRA ✅ ✅
IA3
P-Tuning ✅ ✅
==================== ===== ===== ========= ==
==================== ===== ======== ========= ====== ==
\ GPT 3 Nemotron LLaMa 1/2 Falcon T5
==================== ===== ======== ========= ====== ==
LoRA ✅ ✅
P-Tuning
Adapters (Canonical) ✅ ✅ ✅
IA3 ✅ ✅
==================== ===== ======== ========= ====== ==

Learn more about PEFT in NeMo with the :ref:`peftquickstart` which provides an overview on how PEFT works
in NeMo. Read about the supported PEFT methods
Expand Down
6 changes: 4 additions & 2 deletions docs/source/nlp/nemo_megatron/peft/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Base model classes
PEFT in NeMo is built with a mix-in class that does not belong to any
model in particular. This means that the same interface is available to
different NeMo models. Currently, NeMo supports PEFT for GPT-style
models such as GPT 3, NvGPT, LLaMa 1/2 (``MegatronGPTSFTModel``), as
models such as GPT 3, Nemotron, LLaMa 1/2 (``MegatronGPTSFTModel``), as
well as T5 (``MegatronT5SFTModel``).

Full finetuning vs PEFT
Expand All @@ -78,11 +78,13 @@ PEFT.
trainer = MegatronTrainerBuilder(config).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)
### Training API ###
model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
+ peft_cfg = LoRAPEFTConfig(model_cfg)
+ peft_cfg = LoraPEFTConfig(model_cfg)
+ model.add_adapter(peft_cfg)
trainer.fit(model) # saves adapter weights only
### Inference API ###
# Restore from base then load adapter API
model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
+ model.load_adapters(adapter_save_path, peft_cfg)
Expand Down
Loading

0 comments on commit 9df57ec

Please sign in to comment.