Uzbek Speech-to-Text using Nvidia NeMo ASR

This repository contains code and resources to train a speech-to-text model using the Uzbek voice dataset with Nvidia NeMo's Automatic Speech Recognition (ASR) toolkit.

Prerequisites

A machine with an NVIDIA GPU.
Conda environment manager.
Python 3.10
Pytorch 1.13.1 or above

you have to download dataset from here

you will get clips.zip file and voice_dataset.json file

voice_dataset.json file contains meta data about dataset clips.zip file contains audio files

unzip clips.zip

you have to download pre_trained model from here and unzip it and put into the current directory

Installation

Clone the Repository:
```
git clone https://github.com/KamoliddinS/UzbekvoiceAsrTextToSpeechNemo.git
cd UzbekvoiceAsrTextToSpeechNemo
```
You have to download pre_trained model from here and unzip it and put into the current directory.

Set Up a Conda Environment:

conda create --name nemo_asr_uzbek python==3.10.12
conda activate nemo_asr_uzbek

Install prerequisites:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install NeMo:

 sudo apt-get update && apt-get install -y libsndfile1 ffmpeg
 pip install Cython
 pip install nemo_toolkit['all']

Note: You might need to install additional dependencies based on your specific requirements.

Install other dependencies:
```
pip install -r requirements.txt
```

Usage

The following steps are required to train a speech-to-text model using the Uzbek voice dataset.

1. Data Cleaning - Stage 1

Script: clean_stage_1.py

Input: voice_dataset.json
Output: 1_stage_preprocessed_data.csv

Usage:

python clean_stage_1.py

2. Audio Preprocessing

Script: pre_procecessing_auido.py

Input: Folder path containing the audio files from the uzbekvoice dataset.
Function: Converts .mp3 files to .wav format.

Usage:

python pre_procecessing_auido.py --folder_path /path/to/uzbekvoice/dataset

Note: Download the uzbekvoice dataset audio files and provide the path to the dataset.

3. Levenshtein Cleaning

Script: levenshtain_clean.py

Usage:

python levenshtein_clean.py --input_csv 1_stage_preprocessed_data.csv --audio_files_dir /path/to/preprocessed/wav/files --output_csv output.csv --model_path /path/to/pretrained/model

Note:

Download the pre-trained model from the provided link, unzip it, and place it in the repository's cloned directory. DOWNLOAD
Provide the path to the preprocessed .wav files folder.

4. NeMo ASR Format Conversion

Script: nemo_asr_format.py

Usage:

python nemo_asr_format.py --csv_filepath output.csv --audio_files_path /path/to/audio/files --cer_threshold 0.18

Note: Provide the path to the audio files that were downloaded and preprocessed.

5. Model Training

Script: train.py

Usage:

python train.py --train_json_path train.json --test_json_path test.json --model_name model_name --model_save_path /path/to/save/model --checkpoint True --num_epochs 10

Note:

By default, nemo_asr_format.py outputs train.json and test.json.
Provide the desired model name and the path where you want to save the trained model.
The --checkpoint flag determines whether to evaluate the model or not.

By following the above steps, you can preprocess, clean, and train an Uzbek Speech-to-Text model using Nvidia NeMo ASR. Ensure that all the required datasets and pre-trained models are downloaded and placed in the appropriate directories before running the scripts.

Contributing

If you'd like to contribute to this project, please fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
notebook		notebook
.gitignore		.gitignore
07b465b0529801ea040313b64cce70233b7b94ad7bccbd4d6c57ca4c6584a8e9.wav		07b465b0529801ea040313b64cce70233b7b94ad7bccbd4d6c57ca4c6584a8e9.wav
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
clean_stage_1.py		clean_stage_1.py
conda_environment.yml		conda_environment.yml
docker-compose.yml		docker-compose.yml
levenshtein_clean.py		levenshtein_clean.py
nemo_asr_format.py		nemo_asr_format.py
pre_proccessing_audio.py		pre_proccessing_audio.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uzbek Speech-to-Text using Nvidia NeMo ASR

Prerequisites

Installation

Usage

1. Data Cleaning - Stage 1

2. Audio Preprocessing

3. Levenshtein Cleaning

4. NeMo ASR Format Conversion

5. Model Training

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

KamoliddinS/UzbekVoiceAsrSpeechToTextNemo

Folders and files

Latest commit

History

Repository files navigation

Uzbek Speech-to-Text using Nvidia NeMo ASR

Prerequisites

Installation

Usage

1. Data Cleaning - Stage 1

2. Audio Preprocessing

3. Levenshtein Cleaning

4. NeMo ASR Format Conversion

5. Model Training

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages