Skip to content

Latest commit

 

History

History
57 lines (32 loc) · 2.94 KB

README.md

File metadata and controls

57 lines (32 loc) · 2.94 KB

ASR (Automatic Speech Recognition) Project

ASR

This repository contains Jupyter Notebooks and code for training and fine-tuning Automatic Speech Recognition (ASR) models on voice data in different languages, including Hindi and Urdu.

Project Overview

Automatic Speech Recognition (ASR) is a technology that converts spoken language into written text. ASR systems are crucial for various applications, including transcription services, voice assistants, and more. This project focuses on developing and fine-tuning ASR models for different languages.

Notebooks

  • ASRfromScratch.ipynb: This notebook is dedicated to building an ASR model from scratch. It likely covers data preprocessing, model architecture, training, and evaluation.

  • Pyannote_plays_and_Whisper_rhymes_v_1_0.ipynb: This notebook may involve using Pyannote and Whisper libraries for ASR and diarization tasks. It could contain code for recognizing speech and identifying speakers in the audio.

  • Real_Time_Speech_Transcription_Gradio_ASR_Python_App.ipynb: This notebook might demonstrate a real-time speech transcription application using the ASR model. It could involve integrating the ASR model with a Gradio interface for interactive transcription.

  • Speaker_Diarization_Inference.ipynb: This notebook may focus on speaker diarization, the task of determining "who spoke when" in an audio recording. It could use an ASR model in combination with speaker diarization techniques.

Languages

  • Hindi
  • Urdu
  • and potentially others

Getting Started

To explore and utilize the ASR project notebooks, follow these steps:

  1. Clone this repository to your local machine.
  2. Open the Jupyter Notebooks of interest in your preferred Python environment.
  3. Review the code and documentation within the notebooks for detailed instructions and explanations.

Dependencies

The specific dependencies for each notebook may vary, but common libraries and tools used in ASR projects include:

  • PyTorch or TensorFlow for deep learning-based ASR models.
  • Speech processing libraries like torchaudio, librosa, or pyaudio for audio data handling.
  • Gradio for building interactive speech recognition applications.
  • Pyannote and Whisper for diarization and speaker recognition tasks.

Please refer to each notebook for its specific dependencies and installation instructions.

License

This project is licensed under the MIT License.

Acknowledgments

I would like to acknowledge the open-source ASR community, the developers of Pyannote, Whisper, and other related libraries, whose contributions have made this project possible.


Feel free to explore the provided Jupyter Notebooks to delve into the world of Automatic Speech Recognition in different languages. If you have any questions or encounter any issues, please reach out to the project maintainers.