Merge pull request #1 from avryhof/deepspeech

Deepspeech
avryhof · Dec 6, 2020 · ab4a5ea · ab4a5ea
2 parents c898560 + 8ba4281
commit ab4a5ea
Show file tree

Hide file tree

Showing 4 changed files with 149 additions and 32 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,3 +10,4 @@ zh-CN.zip
 it-IT.zip
 pocketsphinx-python/
 examples/TEST.py
+speech_recognition/deepspeech-data/en-US
diff --git a/README.rst b/README.rst
@@ -37,7 +37,7 @@ Speech recognition engine/API support:
 
 **Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.
 
-To quickly try it out, run ``python -m speech_recognition`` after installing.
+To quickly try it out, run ``python -m speech_recognition`` after installing (which additionally requires the ``pyaudio`` package).
 
 Project links:
 
@@ -48,22 +48,22 @@ Project links:
 Library Reference
 -----------------
 
-The `library reference <https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst>`__ documents every publicly accessible object in the library. This document is also included under ``reference/library-reference.rst``.
+The `library reference <reference/library-reference.rst>`__ documents every publicly accessible object in the library.
 
-See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
+See `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources.
 
 Examples
 --------
 
-See the ``examples/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/examples>`__ in the repository root for usage examples:
+See the `examples directory <examples>`__ in the repository root for usage examples:
 
--  `Recognize speech input from the microphone <https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py>`__
--  `Transcribe an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py>`__
--  `Save audio data to an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py>`__
--  `Show extended recognition results <https://github.com/Uberi/speech_recognition/blob/master/examples/extended_results.py>`__
--  `Calibrate the recognizer energy threshold for ambient noise levels <https://github.com/Uberi/speech_recognition/blob/master/examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
--  `Listening to a microphone in the background <https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py>`__
--  `Various other useful recognizer features <https://github.com/Uberi/speech_recognition/blob/master/examples/special_recognizer_features.py>`__
+-  `Recognize speech input from the microphone <hexamples/microphone_recognition.py>`__
+-  `Transcribe an audio file <examples/audio_transcribe.py>`__
+-  `Save audio data to an audio file <examples/write_audio.py>`__
+-  `Show extended recognition results <examples/extended_results.py>`__
+-  `Calibrate the recognizer energy threshold for ambient noise levels <examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
+-  `Listening to a microphone in the background <examples/background_listening.py>`__
+-  `Various other useful recognizer features <examples/special_recognizer_features.py>`__
 
 Installing
 ----------
@@ -81,24 +81,19 @@ Requirements
 
 To use all of the functionality of the library, you should have:
 
-* **Python** 2.6, 2.7, or 3.3+ (required)
-* **PyAudio** 0.2.11+ (required only if you need to use microphone input, ``Microphone``)
-* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
-* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
+* **Python** `2.6, 2.7, or 3.3+ <https://www.python.org/download/releases/>`__ (required)
+* **PyAudio** 0.2.11+ (required only if you use microphone input, ``Microphone``)
+* **PocketSphinx** (required only if you use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
+* **Google API Client Library for Python** (required only if you use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
 * **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
 
 The following requirements are optional, but can improve or extend functionality in some situations:
 
 * On Python 2, and only on Python 2, some functions (like ``recognizer_instance.recognize_bing``) will run slower if you do not have **Monotonic for Python 2** installed.
-* If using CMU Sphinx, you may want to `install additional language packs <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst#installing-other-languages>`__ to support languages like International French or Mandarin Chinese.
+* If using CMU Sphinx, you may want to `install additional language packs <reference/pocketsphinx.rst#installing-other-languages>`__ to support languages like International French or Mandarin Chinese.
 
 The following sections go over the details of each requirement.
 
-Python
-~~~~~~
-
-The first software requirement is `Python 2.6, 2.7, or Python 3.3+ <https://www.python.org/download/releases/>`__. This is required to use the library.
-
 PyAudio (for microphone users)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -114,20 +109,20 @@ The installation instructions on the PyAudio website are quite good - for conven
 * On OS X, install PortAudio using `Homebrew <http://brew.sh/>`__: ``brew install portaudio``. Then, install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio``.
 * On other POSIX-based systems, install the ``portaudio19-dev`` and ``python-all-dev`` (or ``python3-all-dev`` if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3).
 
-PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository `root directory <https://github.com/Uberi/speech_recognition>`__.
+PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory <third-party>`__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository root directory.
 
 PocketSphinx-Python (for Sphinx users)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 `PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``).
 
-PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
+PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory <third-party>`__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
 
-On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for installation instructions.
+On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for installation instructions.
 
 Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.
 
-See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
+See `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
 
 Google Cloud Speech Library for Python (for Google Cloud Speech API users)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -268,12 +263,12 @@ Developing
 To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
 
 -  Most of the library code lives in ``speech_recognition/__init__.py``.
--  Examples live under the ``examples/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/examples>`__, and the demo script lives in ``speech_recognition/__main__.py``.
--  The FLAC encoder binaries are in the ``speech_recognition/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/speech_recognition>`__.
--  Documentation can be found in the ``reference/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/reference>`__.
--  Third-party libraries, utilities, and reference material are in the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__.
+-  Examples live under the ``examples/`` `directory <examples>`__, and the demo script lives in ``speech_recognition/__main__.py``.
+-  The FLAC encoder binaries are in the ``speech_recognition/`` `directory <speech_recognition>`__.
+-  Documentation can be found in the ``reference/`` `directory <reference>`__.
+-  Third-party libraries, utilities, and reference material are in the ``third-party/`` `directory <third-party>`__.
 
-To install/reinstall the library locally, run ``python setup.py install`` in the project `root directory <https://github.com/Uberi/speech_recognition>`__.
+To install/reinstall the library locally, run ``python setup.py install`` in the project root directory.
 
 Before a release, the version number is bumped in ``README.rst`` and ``speech_recognition/__init__.py``. Version tags are then created using ``git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE"``.
 
@@ -368,7 +363,8 @@ Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaid
 License
 -------
 
-Copyright 2014-2017 `Anthony Zhang (Uberi) <http://anthonyz.ca/>`__. The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.
+Copyright 2014-2017 `Anthony Zhang (Uberi) <http://anthonyz.ca/>`__.
+The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.
 
 SpeechRecognition is made available under the 3-clause BSD license. See ``LICENSE.txt`` in the project's `root directory <https://github.com/Uberi/speech_recognition>`__ for more information.
 

diff --git a/speech_recognition/__init__.py b/speech_recognition/__init__.py
@@ -22,7 +22,7 @@
 import uuid
 
 __author__ = "Anthony Zhang (Uberi)"
-__version__ = "3.8.1"
+__version__ = "3.8.1.fossasia-4"
 __license__ = "BSD"
 
 try:  # attempt to use the Python 2 modules
@@ -496,6 +496,56 @@ def get_flac_data(self, convert_rate=None, convert_width=None):
         return flac_data
 
 
+class DeepSpeechModel():
+    ds = None
+    model_file = None
+    scorer_file = None
+    beam_width = None
+    lm_alpha = None
+    lm_beta = None
+
+    def __init__(self, model_file, scorer_file=None, beam_width=None, lm_alpha=None, lm_beta=None):
+        """
+        Creates a DeepSpeech model from ``model_file``.
+
+        If ``scorer_file`` is given, initialize the scorer, too.
+
+        If ``beam_width``, ``lm_alpha``, ``lm_beta`` is given, the model parameters for the
+        scorer are set accordingly.
+        """
+        try:
+            import deepspeech
+        except ImportError:
+            raise RequestError("missing DeepSpeech module: ensure that DeepSpeech is set up correctly.")
+        except ValueError:
+            raise RequestError("bad DeepSpeech installation; try reinstalling DeepSpeech version 0.7.0 or better.")
+
+        # if model is already created and all parameters agree, don't reinit it
+        if DeepSpeechModel.ds is not None and DeepSpeechModel.model_file == model_file and DeepSpeechModel.scorer_file == scorer_file and DeepSpeechModel.beam_width == beam_width and DeepSpeechModel.lm_alpha == lm_alpha and DeepSpeechModel.lm_beta == lm_beta:
+            return
+
+        DeepSpeechModel.model_file = model_file
+        DeepSpeechModel.scorer_file = scorer_file
+        DeepSpeechModel.beam_width = beam_width
+        DeepSpeechModel.lm_alpha = lm_alpha
+        DeepSpeechModel.lm_beta = lm_beta
+        DeepSpeechModel.ds = deepspeech.Model(model_file)
+        if beam_width:
+            DeepSpeechModel.ds.setModelBeamWidth(beam_width)
+        if scorer_file:
+            DeepSpeechModel.ds.enableExternalScorer(scorer_file)
+            if lm_alpha and lm_beta:
+                DeepSpeechModel.ds.setScorerAlphaBeta(lm_alpha, lm_beta)
+
+    def sampleRate(self):
+        return DeepSpeechModel.ds.sampleRate()
+
+    def recognize(self, audio):
+        recognized_metadata = DeepSpeechModel.ds.sttWithMetadata(audio,1).transcripts[0]
+        recognized_string = ''.join(token.text for token in recognized_metadata.tokens)
+        return recognized_string, recognized_metadata
+
+
 class Recognizer(AudioSource):
     def __init__(self):
         """
@@ -843,6 +893,64 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
         if hypothesis is not None: return hypothesis.hypstr
         raise UnknownValueError()  # no transcriptions available
 
+    def recognize_deepspeech(self, audio_data, language="en-US", show_all=False, ds_beamwidth=None, ds_lm_alpha=None, ds_lm_beta=None, model_file=None, scorer_file=None, model_base_dir=None):
+        """
+        Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using Mozilla's DeepSpeech.
+        """
+        assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
+        assert isinstance(language, str) or (isinstance(language, tuple) and len(language) == 3), "``language`` must be a string or 3-tuple of Sphinx data file paths of the form ``(acoustic_parameters, language_model, phoneme_dictionary)``"
+
+        try:
+            import numpy as np
+        except ImportError:
+            raise RequestError("missing numpy module.")
+
+        try:
+            import deepspeech
+        except ImportError:
+            raise RequestError("missing DeepSpeech module: ensure that DeepSpeech is set up correctly.")
+        except ValueError:
+            raise RequestError("bad DeepSpeech installation; try reinstalling DeepSpeech version 0.7.0 or better.")
+
+        if model_file is None:
+            used_base_dir = model_base_dir
+            if used_base_dir is None:
+                if isinstance(language, str):  # directory containing language data
+                    language_directory = os.path.join(os.path.dirname(os.path.realpath(__file__)), "deepspeech-data", language)
+                    if not os.path.isdir(language_directory):
+                        raise RequestError("missing DeepSpeech language data directory: \"{}\"".format(language_directory))
+                    used_base_dir = language_directory
+                else:
+                    raise RequestError(f"cannot find DeepSpeech data")
+            DSversion = deepspeech.version()
+            # use the tflite version on arm architectures
+            if os.uname()[4][:3] == 'arm':
+                model_file = os.path.join(used_base_dir, "deepspeech-{}-models.tflite".format(DSversion))
+            else:
+                model_file = os.path.join(used_base_dir, "deepspeech-{}-models.pbmm".format(DSversion))
+            scorer_file = os.path.join(used_base_dir, "deepspeech-{}-models.scorer".format(DSversion))
+        if not os.path.isfile(model_file):
+            raise RequestError("missing DeepSpeech model file: \"{}\"".format(model_file))
+        # we might have scorer_file=None because model_file was given but not
+        # scorer_file. In this case we do not use the scorer.
+        if not scorer_file is None and not os.path.isfile(scorer_file):
+            raise RequestError("missing DeepSpeech scorer file: \"{}\"".format(scorer_file))
+
+        # this initializes a new deepspeech model, but the actual model is only loaded once
+        ds = DeepSpeechModel(model_file, scorer_file)
+
+        desired_sample_rate = ds.sampleRate()
+
+        # obtain audio data
+        # the included language models require audio to be 16-bit mono 16 kHz in little-endian format
+        raw_data = audio_data.get_raw_data(convert_rate=desired_sample_rate, convert_width=2)
+
+        recognized_string, recognized_metadata = ds.recognize(np.frombuffer(raw_data, np.int16))
+
+        if show_all: return recognized_metadata
+
+        return recognized_string
+
     def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False):
         """
         Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.

diff --git a/speech_recognition/deepspeech-data/README b/speech_recognition/deepspeech-data/README
@@ -0,0 +1,12 @@
+Directory for deepspeech data
+
+Get the following DeepSpeech model files
+	https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
+	https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.tflite
+	https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
+and put them into the
+	en-US
+directory.
+
+Further languages can be added to respectives directories named after language codes.
+