Skip to content

NVIDIA Neural Modules 1.23.0

Compare
Choose a tag to compare
@ericharper ericharper released this 28 Feb 06:18
· 1509 commits to main since this release
d2283e3

Highlights

Models

Nvidia Starcoder 2 - 15B

NeMo Canary

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/

NeMo LLM

  • Falcon
  • Code Llama
  • StarCoder
  • GPT perf improvements
  • Context parallelism
  • Mistral
  • Mixtral (without expert parallelism)
  • Mcore GPT Dataset integration

NeMo MM

  • CLIP
  • Stable Diffusion (supporting LoRA)
  • Imagen
  • ControlNet (for SD)
  • Instruct pix2pix (for SD)
  • LLAVA
  • NeVA
  • DreamFusion++
  • NSFW filtering

NeMo ASR

  • Lhotse Dataloading support #7880
  • Canary: Multi task multi lingual ASR #8242
  • LongForm Audio for Diarization #7737
  • Faster algorithm for RNN-T Greedy #7926
  • Cache-Aware streaming notebook #8296

NeMo TTS

NeMo Vision

Known Issues

ASR

RNNT WER calculation when fused batch size > 1 during validation / test step()

Previously, the RNNT metric was stateful while the CTC one was not (r1.22.0, r1.23.0)

Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via #8587 and will be fixed in the next release.

Workaround: Explicitly disable fused batch size during inference using the following command

from omegaconf import open_dict
model = ...
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
  decoding_cfg.fused_batch_size = -1
model.change_decoding_strategy(decoding_cfg)

Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the transcribe_speech.py or speech_to_text_eval.py in examples/asr.

Two failing unit tests due to a change in expected results, caused by lhotse version update.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:24.01.speech

Detailed Changelogs

ASR

Changelog

TTS

Changelog
  • [TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
  • Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
  • [TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
  • Added VectorQuantizer base class by @anteju :: PR: #8011

LLMS

Changelog

NeMo Tools

Changelog

General Improvements

Changelog