Skip to content

Commit

Permalink
Merge pull request #1 from avryhof/deepspeech
Browse files Browse the repository at this point in the history
Deepspeech
  • Loading branch information
Amos Vryhof authored Dec 6, 2020
2 parents c898560 + 8ba4281 commit ab4a5ea
Show file tree
Hide file tree
Showing 4 changed files with 149 additions and 32 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ zh-CN.zip
it-IT.zip
pocketsphinx-python/
examples/TEST.py
speech_recognition/deepspeech-data/en-US
58 changes: 27 additions & 31 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Speech recognition engine/API support:

**Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.

To quickly try it out, run ``python -m speech_recognition`` after installing.
To quickly try it out, run ``python -m speech_recognition`` after installing (which additionally requires the ``pyaudio`` package).

Project links:

Expand All @@ -48,22 +48,22 @@ Project links:
Library Reference
-----------------

The `library reference <https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst>`__ documents every publicly accessible object in the library. This document is also included under ``reference/library-reference.rst``.
The `library reference <reference/library-reference.rst>`__ documents every publicly accessible object in the library.

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
See `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources.

Examples
--------

See the ``examples/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/examples>`__ in the repository root for usage examples:
See the `examples directory <examples>`__ in the repository root for usage examples:

- `Recognize speech input from the microphone <https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py>`__
- `Transcribe an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py>`__
- `Save audio data to an audio file <https://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py>`__
- `Show extended recognition results <https://github.com/Uberi/speech_recognition/blob/master/examples/extended_results.py>`__
- `Calibrate the recognizer energy threshold for ambient noise levels <https://github.com/Uberi/speech_recognition/blob/master/examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
- `Listening to a microphone in the background <https://github.com/Uberi/speech_recognition/blob/master/examples/background_listening.py>`__
- `Various other useful recognizer features <https://github.com/Uberi/speech_recognition/blob/master/examples/special_recognizer_features.py>`__
- `Recognize speech input from the microphone <hexamples/microphone_recognition.py>`__
- `Transcribe an audio file <examples/audio_transcribe.py>`__
- `Save audio data to an audio file <examples/write_audio.py>`__
- `Show extended recognition results <examples/extended_results.py>`__
- `Calibrate the recognizer energy threshold for ambient noise levels <examples/calibrate_energy_threshold.py>`__ (see ``recognizer_instance.energy_threshold`` for details)
- `Listening to a microphone in the background <examples/background_listening.py>`__
- `Various other useful recognizer features <examples/special_recognizer_features.py>`__

Installing
----------
Expand All @@ -81,24 +81,19 @@ Requirements

To use all of the functionality of the library, you should have:

* **Python** 2.6, 2.7, or 3.3+ (required)
* **PyAudio** 0.2.11+ (required only if you need to use microphone input, ``Microphone``)
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
* **Python** `2.6, 2.7, or 3.3+ <https://www.python.org/download/releases/>`__ (required)
* **PyAudio** 0.2.11+ (required only if you use microphone input, ``Microphone``)
* **PocketSphinx** (required only if you use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **Google API Client Library for Python** (required only if you use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)

The following requirements are optional, but can improve or extend functionality in some situations:

* On Python 2, and only on Python 2, some functions (like ``recognizer_instance.recognize_bing``) will run slower if you do not have **Monotonic for Python 2** installed.
* If using CMU Sphinx, you may want to `install additional language packs <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst#installing-other-languages>`__ to support languages like International French or Mandarin Chinese.
* If using CMU Sphinx, you may want to `install additional language packs <reference/pocketsphinx.rst#installing-other-languages>`__ to support languages like International French or Mandarin Chinese.

The following sections go over the details of each requirement.

Python
~~~~~~

The first software requirement is `Python 2.6, 2.7, or Python 3.3+ <https://www.python.org/download/releases/>`__. This is required to use the library.

PyAudio (for microphone users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -114,20 +109,20 @@ The installation instructions on the PyAudio website are quite good - for conven
* On OS X, install PortAudio using `Homebrew <http://brew.sh/>`__: ``brew install portaudio``. Then, install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio``.
* On other POSIX-based systems, install the ``portaudio19-dev`` and ``python-all-dev`` (or ``python3-all-dev`` if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using `Pip <https://pip.readthedocs.org/>`__: ``pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3).

PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository `root directory <https://github.com/Uberi/speech_recognition>`__.
PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory <third-party>`__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository root directory.

PocketSphinx-Python (for Sphinx users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``).

PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.
PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory <third-party>`__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.

On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for installation instructions.
On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for installation instructions.

Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.
See `Notes on using PocketSphinx <reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -268,12 +263,12 @@ Developing
To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.

- Most of the library code lives in ``speech_recognition/__init__.py``.
- Examples live under the ``examples/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/examples>`__, and the demo script lives in ``speech_recognition/__main__.py``.
- The FLAC encoder binaries are in the ``speech_recognition/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/speech_recognition>`__.
- Documentation can be found in the ``reference/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/reference>`__.
- Third-party libraries, utilities, and reference material are in the ``third-party/`` `directory <https://github.com/Uberi/speech_recognition/tree/master/third-party>`__.
- Examples live under the ``examples/`` `directory <examples>`__, and the demo script lives in ``speech_recognition/__main__.py``.
- The FLAC encoder binaries are in the ``speech_recognition/`` `directory <speech_recognition>`__.
- Documentation can be found in the ``reference/`` `directory <reference>`__.
- Third-party libraries, utilities, and reference material are in the ``third-party/`` `directory <third-party>`__.

To install/reinstall the library locally, run ``python setup.py install`` in the project `root directory <https://github.com/Uberi/speech_recognition>`__.
To install/reinstall the library locally, run ``python setup.py install`` in the project root directory.

Before a release, the version number is bumped in ``README.rst`` and ``speech_recognition/__init__.py``. Version tags are then created using ``git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE"``.

Expand Down Expand Up @@ -368,7 +363,8 @@ Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaid
License
-------

Copyright 2014-2017 `Anthony Zhang (Uberi) <http://anthonyz.ca/>`__. The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.
Copyright 2014-2017 `Anthony Zhang (Uberi) <http://anthonyz.ca/>`__.
The source code for this library is available online at `GitHub <https://github.com/Uberi/speech_recognition>`__.

SpeechRecognition is made available under the 3-clause BSD license. See ``LICENSE.txt`` in the project's `root directory <https://github.com/Uberi/speech_recognition>`__ for more information.

Expand Down
110 changes: 109 additions & 1 deletion speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import uuid

__author__ = "Anthony Zhang (Uberi)"
__version__ = "3.8.1"
__version__ = "3.8.1.fossasia-4"
__license__ = "BSD"

try: # attempt to use the Python 2 modules
Expand Down Expand Up @@ -496,6 +496,56 @@ def get_flac_data(self, convert_rate=None, convert_width=None):
return flac_data


class DeepSpeechModel():
ds = None
model_file = None
scorer_file = None
beam_width = None
lm_alpha = None
lm_beta = None

def __init__(self, model_file, scorer_file=None, beam_width=None, lm_alpha=None, lm_beta=None):
"""
Creates a DeepSpeech model from ``model_file``.
If ``scorer_file`` is given, initialize the scorer, too.
If ``beam_width``, ``lm_alpha``, ``lm_beta`` is given, the model parameters for the
scorer are set accordingly.
"""
try:
import deepspeech
except ImportError:
raise RequestError("missing DeepSpeech module: ensure that DeepSpeech is set up correctly.")
except ValueError:
raise RequestError("bad DeepSpeech installation; try reinstalling DeepSpeech version 0.7.0 or better.")

# if model is already created and all parameters agree, don't reinit it
if DeepSpeechModel.ds is not None and DeepSpeechModel.model_file == model_file and DeepSpeechModel.scorer_file == scorer_file and DeepSpeechModel.beam_width == beam_width and DeepSpeechModel.lm_alpha == lm_alpha and DeepSpeechModel.lm_beta == lm_beta:
return

DeepSpeechModel.model_file = model_file
DeepSpeechModel.scorer_file = scorer_file
DeepSpeechModel.beam_width = beam_width
DeepSpeechModel.lm_alpha = lm_alpha
DeepSpeechModel.lm_beta = lm_beta
DeepSpeechModel.ds = deepspeech.Model(model_file)
if beam_width:
DeepSpeechModel.ds.setModelBeamWidth(beam_width)
if scorer_file:
DeepSpeechModel.ds.enableExternalScorer(scorer_file)
if lm_alpha and lm_beta:
DeepSpeechModel.ds.setScorerAlphaBeta(lm_alpha, lm_beta)

def sampleRate(self):
return DeepSpeechModel.ds.sampleRate()

def recognize(self, audio):
recognized_metadata = DeepSpeechModel.ds.sttWithMetadata(audio,1).transcripts[0]
recognized_string = ''.join(token.text for token in recognized_metadata.tokens)
return recognized_string, recognized_metadata


class Recognizer(AudioSource):
def __init__(self):
"""
Expand Down Expand Up @@ -843,6 +893,64 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
if hypothesis is not None: return hypothesis.hypstr
raise UnknownValueError() # no transcriptions available

def recognize_deepspeech(self, audio_data, language="en-US", show_all=False, ds_beamwidth=None, ds_lm_alpha=None, ds_lm_beta=None, model_file=None, scorer_file=None, model_base_dir=None):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using Mozilla's DeepSpeech.
"""
assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
assert isinstance(language, str) or (isinstance(language, tuple) and len(language) == 3), "``language`` must be a string or 3-tuple of Sphinx data file paths of the form ``(acoustic_parameters, language_model, phoneme_dictionary)``"

try:
import numpy as np
except ImportError:
raise RequestError("missing numpy module.")

try:
import deepspeech
except ImportError:
raise RequestError("missing DeepSpeech module: ensure that DeepSpeech is set up correctly.")
except ValueError:
raise RequestError("bad DeepSpeech installation; try reinstalling DeepSpeech version 0.7.0 or better.")

if model_file is None:
used_base_dir = model_base_dir
if used_base_dir is None:
if isinstance(language, str): # directory containing language data
language_directory = os.path.join(os.path.dirname(os.path.realpath(__file__)), "deepspeech-data", language)
if not os.path.isdir(language_directory):
raise RequestError("missing DeepSpeech language data directory: \"{}\"".format(language_directory))
used_base_dir = language_directory
else:
raise RequestError(f"cannot find DeepSpeech data")
DSversion = deepspeech.version()
# use the tflite version on arm architectures
if os.uname()[4][:3] == 'arm':
model_file = os.path.join(used_base_dir, "deepspeech-{}-models.tflite".format(DSversion))
else:
model_file = os.path.join(used_base_dir, "deepspeech-{}-models.pbmm".format(DSversion))
scorer_file = os.path.join(used_base_dir, "deepspeech-{}-models.scorer".format(DSversion))
if not os.path.isfile(model_file):
raise RequestError("missing DeepSpeech model file: \"{}\"".format(model_file))
# we might have scorer_file=None because model_file was given but not
# scorer_file. In this case we do not use the scorer.
if not scorer_file is None and not os.path.isfile(scorer_file):
raise RequestError("missing DeepSpeech scorer file: \"{}\"".format(scorer_file))

# this initializes a new deepspeech model, but the actual model is only loaded once
ds = DeepSpeechModel(model_file, scorer_file)

desired_sample_rate = ds.sampleRate()

# obtain audio data
# the included language models require audio to be 16-bit mono 16 kHz in little-endian format
raw_data = audio_data.get_raw_data(convert_rate=desired_sample_rate, convert_width=2)

recognized_string, recognized_metadata = ds.recognize(np.frombuffer(raw_data, np.int16))

if show_all: return recognized_metadata

return recognized_string

def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
Expand Down
12 changes: 12 additions & 0 deletions speech_recognition/deepspeech-data/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Directory for deepspeech data

Get the following DeepSpeech model files
https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.tflite
https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
and put them into the
en-US
directory.

Further languages can be added to respectives directories named after language codes.

0 comments on commit ab4a5ea

Please sign in to comment.