diff --git a/docs/source/asr/api.rst b/docs/source/asr/api.rst index a5b3369177b9..b28fe2db1d88 100644 --- a/docs/source/asr/api.rst +++ b/docs/source/asr/api.rst @@ -1,4 +1,4 @@ -NeMo ASR collection API +NeMo ASR Collection API ======================= diff --git a/docs/source/asr/asr_all.bib b/docs/source/asr/asr_all.bib index 11998d30cd5e..6256864152ac 100644 --- a/docs/source/asr/asr_all.bib +++ b/docs/source/asr/asr_all.bib @@ -1034,7 +1034,7 @@ @misc{park2022multi copyright = {Creative Commons Attribution 4.0 International} } -@inproceedings{vaswani2017attention, +@inproceedings{vaswani2017aayn, title={Attention is all you need}, author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, booktitle={Advances in Neural Information Processing Systems}, diff --git a/docs/source/asr/asr_language_modeling_and_customization.rst b/docs/source/asr/asr_language_modeling_and_customization.rst index 013b31dd28cd..0761f60d2380 100644 --- a/docs/source/asr/asr_language_modeling_and_customization.rst +++ b/docs/source/asr/asr_language_modeling_and_customization.rst @@ -76,27 +76,27 @@ it is stored at the path specified by `kenlm_model_file`. The following is the list of the arguments for the training script: -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| **Argument** | **Type** | **Default** | **Description** | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| nemo_model_file | str | Required | The path to `.nemo` file of the ASR model, or name of a pretrained NeMo model to extract a tokenizer. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| train_paths | List[str] | Required | List of training files or folders. Files can be a plain text file or ".json" manifest or ".json.gz". | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| kenlm_model_file | str | Required | The path to store the KenLM binary model file. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| kenlm_bin_path | str | Required | The path to the bin folder of KenLM. It is a folder named `bin` under where KenLM is installed. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| ngram_length** | int | Required | Specifies order of N-gram LM. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| ngram_prune | List[int] | [0] | List of thresholds to prune N-grams. Example: [0,0,1]. See Pruning section on the https://kheafield.com/code/kenlm/estimation | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| cache_path | str | "" | Cache path to save tokenized files. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| preserve_arpa | bool | ``False`` | Whether to preserve the intermediate ARPA file after construction of the BIN file. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ -| verbose | int | 1 | Verbose level. | -+------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| **Argument** | **Type** | **Default** | **Description** | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| nemo_model_file | str | Required | The path to `.nemo` file of the ASR model, or name of a pretrained NeMo model to extract a tokenizer. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| train_paths | List[str] | Required | List of training files or folders. Files can be a plain text file or ".json" manifest or ".json.gz". | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| kenlm_model_file | str | Required | The path to store the KenLM binary model file. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| kenlm_bin_path | str | Required | The path to the bin folder of KenLM. It is a folder named `bin` under where KenLM is installed. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| ngram_length** | int | Required | Specifies order of N-gram LM. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| ngram_prune | List[int] | [0] | List of thresholds to prune N-grams. Example: [0,0,1]. See Pruning section on the https://kheafield.com/code/kenlm/estimation | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| cache_path | str | ``""`` | Cache path to save tokenized files. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| preserve_arpa | bool | ``False`` | Whether to preserve the intermediate ARPA file after construction of the BIN file. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ +| verbose | int | 1 | Verbose level. | ++------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+ ** Note: Recommend to use 6 as the order of the N-gram model for BPE-based models. Higher orders may need the re-compilation of KenLM to support it. @@ -184,7 +184,7 @@ The following is the list of the important arguments for the evaluation script: +--------------------------------------+----------+------------------+-------------------------------------------------------------------------+ | text_processing.do_lowercase | bool | ``False`` | Whether to make the training text all lower case. | +--------------------------------------+----------+------------------+-------------------------------------------------------------------------+ -| text_processing.punctuation_marks | str | "" | String with punctuation marks to process. Example: ".\,?" | +| text_processing.punctuation_marks | str | ``""`` | String with punctuation marks to process. Example: ".\,?" | +--------------------------------------+----------+------------------+-------------------------------------------------------------------------+ | text_processing.rm_punctuation | bool | ``False`` | Whether to remove punctuation marks from text. | +--------------------------------------+----------+------------------+-------------------------------------------------------------------------+ @@ -527,25 +527,25 @@ The following is the list of the arguments for the opengrm script: | kenlm_bin_path | str | Required | The path to the bin folder of KenLM library. It is a folder named `bin` under where KenLM is installed. | +----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ | ngram_bin_path | str | Required | The path to the bin folder of OpenGrm Ngram. It is a folder named `bin` under where OpenGrm Ngram is installed. | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| arpa_a | str | Required | Path to the ARPA N-gram model file A | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| alpha | float | Required | Weight of N-gram model A | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| arpa_b | int | Required | Path to the ARPA N-gram model file B | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| beta | float | Required | Weight of N-gram model B | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| out_path | str | Required | Path for writing temporary and resulting files. | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| test_file | str | None | Path to test file to count perplexity if provided. | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| symbols | str | None | Path to symbols (.syms) file. Could be calculated if it is not provided.| -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| nemo_model_file | str | None | The path to '.nemo' file of the ASR model, or name of a pretrained NeMo model. | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ -| force | bool | ``False`` | Whether to recompile and rewrite all files | -+----------------------+--------+------------------+-------------------------------------------------------------------------+ ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| arpa_a | str | Required | Path to the ARPA N-gram model file A | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| alpha | float | Required | Weight of N-gram model A | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| arpa_b | int | Required | Path to the ARPA N-gram model file B | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| beta | float | Required | Weight of N-gram model B | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| out_path | str | Required | Path for writing temporary and resulting files. | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| test_file | str | None | Path to test file to count perplexity if provided. | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| symbols | str | None | Path to symbols (.syms) file. Could be calculated if it is not provided. | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| nemo_model_file | str | None | The path to '.nemo' file of the ASR model, or name of a pretrained NeMo model. | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ +| force | bool | ``False`` | Whether to recompile and rewrite all files | ++----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+ ****************** diff --git a/docs/source/asr/examples/kinyarwanda_asr.rst b/docs/source/asr/examples/kinyarwanda_asr.rst index f8057585b104..792d4f7aa446 100644 --- a/docs/source/asr/examples/kinyarwanda_asr.rst +++ b/docs/source/asr/examples/kinyarwanda_asr.rst @@ -429,7 +429,7 @@ Training from scratch and finetuning ASR models ########## -Our goal was to train two ASR models with different architectures: `Conformer-CTC `_ and `Conformer-Transducer `_, with around 120 million parameters. +Our goal was to train two ASR models with different architectures: :ref:`Conformer-CTC ` and :ref:`Conformer-Transducer `, with around 120 million parameters. The CTC model predicts output tokens for each timestep. The outputs are assumed to be independent of each other. As a result the CTC models work faster but they can produce outputs that are inconsistent with each other. CTC models are often combined with external language models in production. In contrast, the Transducer models contain the decoding part which generates the output tokens one by one and the next token prediction depends on this history. Due to autoregressive nature of decoding the inference speed is several times slower than that of CTC models, but the quality is usually better because it can incorporate language model information within the same model. Training scripts and configs @@ -604,7 +604,7 @@ Error analysis Still, even WER of 16% is not as good as we usually get for other languages trained with NeMo toolkit, so we may want to look at the errors that the model makes to better understand what's the problem. -We can use `Speech Data Explorer `_ to analyze the errors. +We can use :doc:`Speech Data Explorer <../../tools/speech_data_explorer>` to analyze the errors. If we run diff --git a/docs/source/asr/intro.rst b/docs/source/asr/intro.rst index d8fe1f105caf..7d1270af1267 100644 --- a/docs/source/asr/intro.rst +++ b/docs/source/asr/intro.rst @@ -103,7 +103,7 @@ After :ref:`training ` an N-gram LM, you can use it for transcri decoding_mode=beamsearch_ngram \ decoding_strategy="" -See more information about LM decoding :doc:`here <./asr_language_modeling>`. +See more information about LM decoding :doc:`here <./asr_language_modeling_and_customization>`. Use real-time transcription --------------------------- @@ -179,8 +179,8 @@ Preparing ASR datasets NeMo includes preprocessing scripts for several common ASR datasets. The :doc:`Datasets <./datasets>` section contains instructions on running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data. -Further information -------------------- +NeMo ASR Documentation +---------------------- For more information, see additional sections in the ASR docs on the left-hand-side menu or in the list below: .. toctree:: @@ -188,7 +188,7 @@ For more information, see additional sections in the ASR docs on the left-hand-s models datasets - asr_language_modeling + asr_language_modeling_and_customization results scores configs diff --git a/docs/source/asr/models.rst b/docs/source/asr/models.rst index 4f05cec410fa..f8b4ef72196e 100644 --- a/docs/source/asr/models.rst +++ b/docs/source/asr/models.rst @@ -24,7 +24,7 @@ Canary-1B is the latest ASR model from NVIDIA NeMo. It sits at the top of the `H You can `download the checkpoint `__ or try out Canary in action in this `HuggingFace Space `__. -Canary-1B is an encoder-decoder model with a :ref:`FastConformer Encoder ` and Transformer Decoder :cite:`asr-models-vaswani2017attention`. +Canary-1B is an encoder-decoder model with a :ref:`FastConformer Encoder ` and Transformer Decoder :cite:`asr-models-vaswani2017aayn`. It is a multi-lingual, multi-task model, supporting automatic speech-to-text recognition (ASR) in 4 languages (English, German, French, Spanish) as well as translation between English and the 3 other supported languages. diff --git a/docs/source/core/core.rst b/docs/source/core/core.rst index 7fe4a65cc32f..6e5efa56d5f0 100644 --- a/docs/source/core/core.rst +++ b/docs/source/core/core.rst @@ -174,9 +174,9 @@ via PyTorch Lightning `hooks ` -- :ref:`Natural Language Processing (NLP) <../nlp/models>` -- :ref:`Text-to-Speech Synthesis (TTS) <../tts/intro>` +- :doc:`Automatic Speech Recognition (ASR) <../asr/intro>` +- :doc:`Natural Language Processing (NLP) <../nlp/models>` +- :doc:`Text-to-Speech Synthesis (TTS) <../tts/intro>` PyTorch Lightning Trainer ~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/multimodal/mllm/configs.rst b/docs/source/multimodal/mllm/configs.rst index d54be2e1b3b6..6e9f9b2b8d10 100644 --- a/docs/source/multimodal/mllm/configs.rst +++ b/docs/source/multimodal/mllm/configs.rst @@ -1,7 +1,7 @@ Common Configuration Files ========================== -This section provides a detailed overview of the NeMo configuration file setup specific to models within the NeMo Multimodal Language Model collection. For foundational knowledge about setting up and executing experiments common to all NeMo models, such as the Experiment Manager and PyTorch Lightning trainer parameters, refer to the :doc:`../core/core` section. +This section provides a detailed overview of the NeMo configuration file setup specific to models within the NeMo Multimodal Language Model collection. For foundational knowledge about setting up and executing experiments common to all NeMo models, such as the Experiment Manager and PyTorch Lightning trainer parameters, refer to the :doc:`core <../../core/core>` documentation. Within the configuration files of the NeMo Multimodal Language Model, details concerning dataset(s), augmentation, optimization parameters, and model architectural specifications are central. This page explores each of these aspects. diff --git a/docs/source/multimodal/vlm/configs.rst b/docs/source/multimodal/vlm/configs.rst index 160ba05cd6d0..cc383cb64b62 100644 --- a/docs/source/multimodal/vlm/configs.rst +++ b/docs/source/multimodal/vlm/configs.rst @@ -1,7 +1,7 @@ Common Configuration Files ========================== -This section provides a detailed overview of the NeMo configuration file setup specific to models within the NeMo Multimodal Language Model collection. For foundational knowledge about setting up and executing experiments common to all NeMo models, such as the Experiment Manager and PyTorch Lightning trainer parameters, refer to the :doc:`../core/core` section. +This section provides a detailed overview of the NeMo configuration file setup specific to models within the NeMo Multimodal Language Model collection. For foundational knowledge about setting up and executing experiments common to all NeMo models, such as the Experiment Manager and PyTorch Lightning trainer parameters, refer to the :doc:`core <../../core/core>` documentation. Within the configuration files of the NeMo Multimodal Language Model, details concerning dataset(s), augmentation, optimization parameters, and model architectural specifications are central. This page explores each of these aspects. diff --git a/docs/source/nlp/nemo_megatron/intro.rst b/docs/source/nlp/nemo_megatron/intro.rst index faf315a40c04..6ddf008214cc 100644 --- a/docs/source/nlp/nemo_megatron/intro.rst +++ b/docs/source/nlp/nemo_megatron/intro.rst @@ -1,7 +1,7 @@ Large Language Models ===================== -To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide! `_. +To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide `_. * GPT-style models (decoder only) * T5/BART/UL2-style models (encoder-decoder) diff --git a/docs/source/nlp/punctuation_and_capitalization_lexical_audio.rst b/docs/source/nlp/punctuation_and_capitalization_lexical_audio.rst index d2a46af8c117..8314676e5c4c 100644 --- a/docs/source/nlp/punctuation_and_capitalization_lexical_audio.rst +++ b/docs/source/nlp/punctuation_and_capitalization_lexical_audio.rst @@ -15,7 +15,7 @@ Like in these examples: Yeah, they make you work. Yeah, over there you walk a lot? or Yeah, they make you work. Yeah, over there you walk a lot. -You can find more details on text only punctuation and capitalization in `Punctuation And Capitalization's page `_. In this document, we focus on model changes needed to use acoustic features. +You can find more details on text only punctuation and capitalization in the :doc:`Punctuation And Capitalization page <./punctuation_and_capitalization>`. In this document, we focus on model changes needed to use acoustic features. Quick Start Guide ----------------- @@ -35,7 +35,7 @@ Quick Start Guide Model Description ----------------- -In addition to `Punctuation And Capitalization model `_ we add audio encoder (e.g. Conformer's encoder) and attention based fusion of lexical and audio features. +In addition to :doc:`Punctuation And Capitalization model <./punctuation_and_capitalization>` we add audio encoder (e.g. Conformer's encoder) and attention based fusion of lexical and audio features. This model architecture is based on `Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech `__ :cite:`nlp-punct-sunkara20_interspeech`. .. note:: @@ -50,7 +50,7 @@ This model architecture is based on `Multimodal Semi-supervised Learning Framewo Raw Data Format --------------- -In addition to `Punctuation And Capitalization Raw Data Format `_ this model also requires audio data. +In addition to :ref:`Punctuation And Capitalization Raw Data Format ` this model also requires audio data. You have to provide ``audio_train.txt`` and ``audio_dev.txt`` (and optionally ``audio_test.txt``) which contain one valid path to audio per row. Example of the ``audio_train.txt``/``audio_dev.txt`` file: @@ -100,14 +100,14 @@ Training Punctuation and Capitalization Model --------------------------------------------- The audio encoder is initialized with pretrained ASR model. You can use any of ``list_available_models()`` of ``EncDecCTCModel`` or your own checkpoints, either one should be provided in ``model.audio_encoder.pretrained_model``. -You can freeze audio encoder during training and add additional ``ConformerLayer`` on top of encoder to reduce compute with ``model.audio_encoder.freeze``. You can also add `Adapters `_ to reduce compute with ``model.audio_encoder.adapter``. Parameters of fusion module are stored in ``model.audio_encoder.fusion``. +You can freeze audio encoder during training and add additional ``ConformerLayer`` on top of encoder to reduce compute with ``model.audio_encoder.freeze``. You can also add :doc:`Adapters <../core/adapters/components>` to reduce compute with ``model.audio_encoder.adapter``. Parameters of fusion module are stored in ``model.audio_encoder.fusion``. An example of a model configuration file for training the model can be found at: `NeMo/examples/nlp/token_classification/conf/punctuation_capitalization_lexical_audio_config.yaml `__. Configs ^^^^^^^^^^^^ .. note:: - This page contains only parameters specific to lexical and audio model. Others parameters can be found in `Punctuation And Capitalization's page `_. + This page contains only parameters specific to lexical and audio model. Others parameters can be found in the :doc:`Punctuation And Capitalization page <./punctuation_and_capitalization>`. Model config ^^^^^^^^^^^^ diff --git a/docs/source/nlp/text_normalization/wfst/wfst_resources.rst b/docs/source/nlp/text_normalization/wfst/wfst_resources.rst index 95e9748d7c83..fb11bf9b317e 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_resources.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_resources.rst @@ -10,7 +10,7 @@ Resources and Documentation - List of `TN/ITN issues `_, use `TN/ITN` label - TN/ITN related `discussions `_, use `TN/ITN` label -- Documentation on how to generate `.far files for deployment in Riva (via Sparrowhawk) `_ +- Documentation on how to generate :doc:`.far files for deployment in Riva (via Sparrowhawk) <./wfst_text_processing_deployment>`. - Tutorial that provides an `Overview of NeMo-TN/ITN `_ - Tutorial on `how to write new grammars `_ in `Pynini `_ diff --git a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst index 1210779363d9..7e1a34c3864e 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst @@ -138,9 +138,9 @@ Audio-based TN Additional Arguments: -* ``text`` - Input text or `JSON manifest file `_ with multiple audio paths. +* ``text`` - Input text or :ref:`JSON manifest file` with multiple audio paths. * ``audio_data`` - (Optional) Input audio. -* ``model`` - `Off-shelf NeMo CTC ASR model name `_ or path to local NeMo model checkpoint ending on .nemo +* ``model`` - :ref:`Off-shelf NeMo CTC ASR model name ` or path to local NeMo model checkpoint ending on .nemo * ``n_tagged`` - number of normalization options to output. diff --git a/docs/source/starthere/intro.rst b/docs/source/starthere/intro.rst index 77a1ca0255a1..eaeab3c212d0 100644 --- a/docs/source/starthere/intro.rst +++ b/docs/source/starthere/intro.rst @@ -9,7 +9,7 @@ Introduction .. _dummy_header: NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. -To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide! `_ +To learn more about using NeMo in generative AI workflows, please refer to the `NeMo Framework User Guide `_. `NVIDIA NeMo Framework `_ has separate collections for Large Language Models (LLMs), Multimodal (MM), Computer Vision (CV), Automatic Speech Recognition (ASR), diff --git a/docs/source/tools/nemo_forced_aligner.rst b/docs/source/tools/nemo_forced_aligner.rst index c977ad676181..a4ed90fa7f9f 100644 --- a/docs/source/tools/nemo_forced_aligner.rst +++ b/docs/source/tools/nemo_forced_aligner.rst @@ -6,7 +6,7 @@ NFA is hosted here: https://github.com/NVIDIA/NeMo/tree/main/tools/nemo_forced_a NFA is a tool for generating token-, word- and segment-level timestamps of speech in audio using NeMo's CTC-based Automatic Speech Recognition models. You can provide your own reference text, or use ASR-generated transcription. -You can use NeMo's ASR Model checkpoints out of the box in `14+ languages `_, or train your own model. +You can use NeMo's ASR Model checkpoints out of the box in :ref:`14+ languages `, or train your own model. NFA can be used on long audio files of 1+ hours duration (subject to your hardware and the ASR model used). Demos & Tutorials diff --git a/docs/source/tools/speech_data_explorer.rst b/docs/source/tools/speech_data_explorer.rst index 5e7a28812b39..a57cb442f468 100644 --- a/docs/source/tools/speech_data_explorer.rst +++ b/docs/source/tools/speech_data_explorer.rst @@ -21,7 +21,7 @@ Speech Data Explorer (SDE) is a `Dash `__-based web ap SDE Demo Instance ----------------- -To demonstrate both the `CTC-Segmentation `_ and Speech Data Explorer tools, we re-segmenting the development set as of `the LibriSpeech corpus `_. +To demonstrate both the :doc:`CTC-Segmentation <./ctc_segmentation>` and Speech Data Explorer tools, we re-segmenting the development set as of `the LibriSpeech corpus `_. We concatenated all audio files from the dev-clean split into a single file and set up the CTC-Segmentation tool to cut the long audio file into original utterances. We used the CTC-based `QuartzNet15x5Base-En ASR model `_. The segmented corpus has 3.82% WER and contains 300 out of the initial 323 minutes of audio. diff --git a/docs/source/tts/configs.rst b/docs/source/tts/configs.rst index 3a4b99226e2e..d720b37aa721 100644 --- a/docs/source/tts/configs.rst +++ b/docs/source/tts/configs.rst @@ -106,7 +106,7 @@ Text normalization (TN) converts text from written form into its verbalized form Tokenizer Configuration ------------------------ -Tokenization converts input text string to a list of integer tokens. It may pad leading and/or trailing whitespaces to a string. NeMo tokenizer supports grapheme-only inputs, phoneme-only inputs, or a mixer of grapheme and phoneme inputs to disambiguate pronunciations of heteronyms for English, German, and Spanish. It also utilizes a grapheme-to-phoneme (G2P) tool to transliterate out-of-vocabulary (OOV) words. Please refer to the Section :doc:`../text_processing/g2p/g2p` and `TTS tokenizer collection `_ for more details. Note that G2P integration to NeMo TTS tokenizers pipeline is upcoming soon. The following example sets up a ``EnglishPhonemesTokenizer`` with a mixer of grapheme and phoneme inputs where each word shown in the heteronym list is transliterated into graphemes or phonemes by a 50% chance. +Tokenization converts input text string to a list of integer tokens. It may pad leading and/or trailing whitespaces to a string. NeMo tokenizer supports grapheme-only inputs, phoneme-only inputs, or a mixer of grapheme and phoneme inputs to disambiguate pronunciations of heteronyms for English, German, and Spanish. It also utilizes a grapheme-to-phoneme (G2P) tool to transliterate out-of-vocabulary (OOV) words. Please refer to the :doc:`G2P section <./g2p>` and `TTS tokenizer collection `_ for more details. Note that G2P integration to NeMo TTS tokenizers pipeline is upcoming soon. The following example sets up a ``EnglishPhonemesTokenizer`` with a mixer of grapheme and phoneme inputs where each word shown in the heteronym list is transliterated into graphemes or phonemes by a 50% chance. .. code-block:: yaml